ENGIN. LIBRARY 


f Research of the National Bureau of Standards Vol. 49, No. 1, July 1952 Research Paper 2336 


Fine Structure in Some Infrared Bands of Methylene 
Halides — 


Earle K. Plyler and W. S. Benedict 


The infrared absorption spectra of methylene chloride, methylene bromochloride, and 
methylene bromide have been studied with a high-resolution grating spectrometer in the 
region from 1.6 to 2.3 microns. The purpose of these measurements was to determine the 
molecular constants of these molecules from their spectra. Using the saturated vapor in a 
cell 60 centimeters long, it was possible to resolve the fine structure in three to five bands of 
each molecule. All the resolved bands are overtones and combinations of fundamental 
vibrations localized in the methylene radical, and hence appear at nearly the same fre- 
quencies for each molecule. Bands with 0 branches (symmetry types B and C) and without 
that feature appear. Application of combination relations to all bands of a given molecule 
vield rotational spacings of the ground state agreeing within 0.5 percent. The spacings, 
[A’’—(B’’ + C"")/2], are, respectively, for CH,Clh, 0.955 em-'; CH,BrCl, 0.898 em~!; and 


CH,Br, 0.821 em~'. 


The molecular dimensions cannot be uniquely determined from these data alone, but if 


tetrahedral angles are assumed, they lead to improbably low values of the C 
Cl and 1.83 A for C 
angle is increased to 112°, as found by electron diffraction, the C 
ment with the electron diffraction distance 1.766 A, and C 


distance, namely, 1.70 A for C 


and 1.911 in CH,BrCl. 


The vibrational bands of methylene chloride, 
methylene bromochloride, and methylene bromide 
have been measured in the infrared region, and the 


bands have been assigned to various modes of | 


motion.’ In the spectra of these molecules several 
overtone and combination bands were observed in 
the region 1.6 to 2.3 4. This region is well suited 
for observations with the 15,000-line-per-inch grat- 
ing instrument, using a lead-sulfide detector. <A 
detailed deseription of this instrument is given in 
a previous paper. 

The purpose of this investigation was to observe 
and analyze the rotational fine structure of the 
overtone and combination bands. From such data, 


| 





it is possible to draw certain conclusions concerning | 


the interatomic dimensions in the vibrationless 
ground state of these three closely related molecules, 
and to demonstrate the marked similarities in the 
interactions of vibration and rotation in the bands 
in question. 

A tungsten-ribbon-filament lamp, with currents 
from 35 to. 40 amperes, was used as the source. 
The spectral slit width was about 0.15 cm™' in the 
regions of 1.6 and 2.2 yw. The absorption cell was 
60 em in length, and the vapors were allowed to 
enter the evacuated cell until a saturated state 
was reached at room temperature. For the more 
intense bands, the pressure was reduced. The 
conditions of pressure for each measurement are 
given in the captions of the figures. 

In order to obtain a good accuracy in the measure- 
nent of the wavelengths of the rotational bands of 
these molecules, atomic emission lines of krypton, 
argon, and neon were superimposed on the recorded 
chart simultaneously with the recording of the 
absorption spectra. A fairly large number of lines 

Earle K. Plyler and W. 8. Benedict, J. Research NBS 47, 202 (1951) R P2245. 
* Facle K. Plyler and Norman Gailar, J. Research NBS 47, 248 (1951) R P2249. 


207064—52——-1 





halogen 
If the halogen — C — halogen 

Cl distance is in good agree- 
Br becomes 1.907 A in CH,Br, 


Br. 


were available, because the first, second, and third 
orders of certain wavelengths would be superimposed 
on the first-order region under study. When the 
standard lines were within SOA of each other, it 
was found that measurements on lines between the 
standards could be repeated on separate runs to 
0.5 A, or 0.1 em™', in the 2.2-u region. All other 
rotational lines were measured by means of a 
dispersion curve, which was determined by plotting 
the known wavelengths as a function of the counter 
number, and which indicates the position of the 
grating. From this relationship the wavelength of 
any line could be determined. Although — this 
method of measurement does not yield as great 
accuracy as the measurements of the lines, which 
are close by a superimposed standard, it was found 
that the wave numbers of any line in a band never 
varied over 0.3 cm~' on separate runs, and usually 
much less than this amount. 

Typical experimental results are shown in figures 
1 to 4, inclusive. A number of additional tracings 
were obtained and measured, revealing additional 
weaker bands whose fine structure could be inter- 
preted in part, but these are not reproduced. Figures 
1 and 2 present four bands of CH,Cl,; figure 3, 
two bands of CH,BrCl, and figure 4, four bands 
of CH,Br,. It will be noted that the bands of all 
three molecules are roughly similar in position and 
structure, and that the fine structure is of two 
types: (1) bands with a central Q branch, flanked 
by fairly broad but readily resolved “lines’’ whose 
average spacing is approximately 1.64 em in 
CH,Br,, 1.80 em™' in CH,BrCl, and 1.91 em"! 
in CH,Cl,, and (2) bands whose resolved “lines” 
show identical spacings with those just listed, but 
with the central Q branch absent. 

From the approximate values of the interatomic 
distances and angles (C—H=1.09 A, C—Cl=1.76 











DEFLECTION —~ 











~ , ~ 
Fieure 1. 
Saturated vapor, 40 cm, was measured in a 60 cm cell. 


The infrared absorption spectra of CHyCl, vapor for the bands w+ w;, we+2w;, and we+w). 





OEFLECTION—- 











Figure 2. 
Saturated vapor at room temperature was used with a 60 cm cell. 


A, C—Br=1.90 A, tetrahedral angles) character- 
istic of halogenated methanes, it is possible to caleu- 
late trial moments of inertia. For all three molecules 
the axis of least moment is roughly parallel to the 
halogen-halogen line (for the symmetrical mole- 
cules, it is precisely parallel to this line), and the 
two other moments are much larger in magnitude 
and approximately equal. Accordingly, the asym- 
metry parameter 6=(B—(C)/(A—C) is quite small, 
taking the values 0.002, 0.004, and 0.010, respective- 
lv, for CH,Br,, CH,CIBr, and CH,Cl,. Under these 
circumstances, the rotational structure will approxi- 
mate that of a symmetric top, and the familiar com- 
bination relations for determining the molecular con- 
stants may be applied. The two types of observed 





The spectrum of the 2w. band of CH,Ch. 


bands mentioned above are types B and C bands, 
respectively, of the asymmetric rotor, with the change 
of dipole moment along the intermediate and greatest 
axis of inertia, both of which in the limiting case of 
the symmetric rotor (when B=C, or 6=0) become 


“perpendicular” bands. The observed “lines” are 
Reg and Pg zero branches, each consisting of a number 
of lines with AJ=0, AK=+1. These are super- 
posed on the unresolved background of lines with 
AJ=+1, AK=+1. The peaks are broadened by 
several causes (a) the small “— of asymmetry 
splits the lines of different J. This effect is most 
pronounced for the lines of low K, near the band 
center, and makes it difficult to distinguish the 
lines with K< 2; (b) the different moments of inertia 








DEFLECTION——- 


Phe « 


in the 
the e 
shift t 
effects 
molec 
ent is 


, mome 


somev 





Wig? Ws 


4466 3cm" 











The we+w; and 2w, absorption bands of CH,BrCl, with a path of 60 cm and saturated vapor pressure at room 
temperature. 








DE FLECTION——- 





WW) Oe 








Figure 4. 


lhe cell length was 60 cm, and the saturated vapor pressure was 44 mm, 


in the upper and lower vibrational states, as well as 
the centrifugal distortions of the molecules, also 
shift the positions of the lines of different J. These 
effects are most pronounced at high K; and (c) the 
molecules with chlorine and bromine atoms of differ- 
ent isotopic weights will have slightly different 
;moments of inertia, and hence their lines will be 
somewhat displaced, leading to broadening. Quan- 


The we + ws, we+ a, 2a; and 2w, absorption bands of CH,Br, 


titative estimates of the importance of all these 
effects have been made. It is concluded that they 
are relatively unimportant, and that it is permissible 
to apply the symmetric rotor formulas to the ‘‘lines”’ 
of K>2, thereby deriving moments of inertia refer- 
ring to molecules with “average” Cl and Br atomic 
weights. 

The wavelengths and wave numbers of the lines 





used to derive the band origins and effective rota- 
tional constants are given in tables 1 to 3. The 
observed values are compared with values calculated 
from the formula for a perpendicular band of a 
symmetric top, 


ye =myt+ U+2UK+(U—-L)R’, 


where » is the band origin, = A’—(B’+C€")/2, 
and K is the quantum num- 

The + sign is taken for the 
Re branch frequency »£, and the — sign is for the 

2 frequency, vk. Since each band of a given 
molecu/e has a common lower state, the combination 
differeye es A, F’’(K)=—v8 _,—vk .,=4LK should be 
equal, and should be a linear function of K. These 
differen ces, divided by K, are plotted as a function of 
K in fig ure 5. It will be noted that the points in 
figure 5 are essentially constant, and that the agree- 
ment of observed and calculated frequencies is good. 
The constants are collected in table 4. The limits 
of error cited are estimated from the scatter of the 
points, and result from a combination of the exper- 
imental error in leceting and measuring the line 
positions and the theoretical approximations made 
in reducing data for a nonrigid, slightly asymmetric 
rotating molecule of compound-isotopic composition 
by formulas applicable to a rigid, symmetric mono- 
isotopic molecule. 

Table 4 also gives other pertinent data for the 
bands, namely, the band origin » and the vibrational 
convergence [°—L. These were obtained from the 
relation 


L=A’’—(B’+C"’)/2, 
ber for the lower state. 





965 
.960 
955 


950 





0° °o°0 OMe Bra 


° 














rn i 4 1 rn rn 
0 2 6 8 20 24 
K 


Ficure 5. A plot of 4,F’'(K)/4K as a function of K for each 
of the molecules CH,Ch, CH, BrCl, and CH,Br;. 








vol AK) = (vk +h .,)/2=m+(U —L)/2+(U—DR(K +; 


The straight line resulting from a plot of (x 
against AK(A +1) has as slope (U—Z) and as inty. 
cept m+ (U—L)/2. 

Included in table 4 are the vibrational data for ee 
band, in terms of the combination, or overtonp 
frequency involved (the numbering is that used an) 
described in the article cited in footnote 1); the roty. 
tional type, which confirms this assignment, and tly 
dynamic interpretation of the frequencies in questioy 
are also included. It will be noted that the vibry. 
tional motions all involve the methylene group; Pu, 
and vy, being the asymmetric and symmetry 
stretching frequencies, and dy, is the symmetric CH 
deformation. When the combination s+a is asym. 
metric, we have type C; when it is symmetr) 
(2s or 2a), we have type B. The fact that corr. 
sponding transitions appear at nearly the same frp. 
quency in the three molecules, and with closely 
corresponding values of the convergence U'—L, 
further indication of how well the vibrational motioy 
is localized in the methylene radical. This has thy 
further result that there is very little vibrationg| 


TABLe 1. Observed and calculated wave numbers of the ling 
of three bands of methylene chloride (CHyCl,) 


The observed values only are listed for the band we+2u, 


wera 


Caleu- Ob- Caleu- 


Ob- Caleu- Ob- 
lated served lated 


served lated served 


cm! cm 
4428. 6 SS06. 
4430. 6 5S08. 
4432.7 5900. 6 
Hd 5902 
4436 5901. : 


4438 
4440 
4442. 
44 
4446. 


www we 


4445.6 
4450.6 


—— ih th 


ccs 


a 
a 
& 
<= 


z 


wee 
= 
SeNK eH 

-ae 


cnn 


S55 REEDE ERSTE 


RCS WANES Hes-0c 








Isotop 
tion fp 
than ( 
is in 

isotop 
The 
tional 
not, of 
dimen 


lines | 


2. Observed and calculated wave numbers of the 


f two bands of methylene bromochloride (CH2BrCl) 


Observe 


Caleu- 
Observed ~| 


lated 


400s. 
6070. 
H071. 4 


e099 
6100 


nowy 
HLOOL § 


6102. 6 
6104. : 
6106. 
6107 
6109. 


6102 
6104. ! 
6106. : 
6107. ¢ 
6109 


611.5 
6113 
6115 
6116. 
6118. 6 


611i 
6112. ' 
6114 
6116. : 
6118. 


6119.5 6'2.3 


TABLE 4. 


Observed and calculated wave numbers of the lines of 
three bands of methylene bromide (CH,Br2) 


TABLE 3. 


The observed values only are listed for the band we+ er. 


Ob- 
served 


Caleu- 
lated 


Ob- 
served 


Ob- 
serve 


Calen- 
lated 


Ob- 
served 


em* 


4432.3 
4444.0 
4435.7 


4437.5 
4439.2 
4440.9 
4442.6 
1444.3 


move. 
6004 
095. 7 


6092 
6004 
6095, 


5913 
5915. ! 


6097 
60g 
6100. 
6102. ! 
6104. 


(097 
60u9 
6100 
6102 
€104. : 


5917. 3 4446.0 
SUIS. 
5920. ! 
5g22. : 
5923 


5953. 
4452.8 5O5S5, 
6105. ¢ 
6107. 5 


A105, ¢ 
6107.7 


5925 46.4 
4456 1 


4466.0 506. 6 
4467.6 
4469.1 
4470.5 


5USS. 6 
5040. 3 
Sul 


5943. 
5985. 
5046. 
59S. 
5949. 


4072.4 


O51. f 
5953. 


5954 


4485 
44866 


4485 
44sy 
4491 
4192 
4404. < 
4495. 6 


Constants for the molecules of methylene chloride, 


methylene bromochloride, and methylene bromide 


Molecule 


L=A”"—(B"+C"”)/2, em—! 


Band Type Vibration 


vHaténe 

UHat+vHs 
2uHs 
2u He 

vHat2 He 


isotopic shift due to the Cl or Br isotopes. Calcula- 
tion predicts that the »_’s should be shifted by less 
than 0.01 em=', and 6q by less than 0.1 em~', which 
is in accordance with the failure to observe any 
isotopic shifts. 

The determination of a single ground-state rota- 
tional constant, A’’—(B’’ +C’’)/2, in this work can- 
not, of course, lead to a determination of the molecular 
dimensions. These depend on four or five param- 





CH.Chk 


0.955 +0.003 


CH,C1Br CH Bry 


O.398 +0.003 0.821 +0.003 


cm cm 
4406.3 0. 004 4461.7 
OO4 5064.8 
O44 6113.4 
5082. 9 


0). 004 


4). 003 6000.9 


eters. The data do, however, restrain the limits 
within which the dimensions must lie. The C—H 
distance and HCH angle do not appreciably affect 
the moments, and may safely be taken as 1.093 107° 
cm, as generally observed in methane and its deriva- 
tives, and tetrahedral HCH angle. The observed 
A’’—(B"’+C"’)/2 then leads to a unique relation 
between XCX angle and C—X distance for CH,Cl, 
and CH,Br,. Assuming tetrahedral angles, the 





C—Cl and C—Br distances would be, respectively, 
1.70 and 1.83 A, improbably low values. If the 
angle is widened to 112°, as found by electron dif- 
fraction,’ the C—C] distance becomes 1.766 A, and 
C—Br is 1.907 A in CH,Br,. With this same value, 
112° for BrCCl, and 1.766 A for C—Cl, the C—Br 
distance is 1.911 A. Our fine-structure spacings are 
thus entirely consistent with the electron diffraction 
data, and confirm the fact that the halogen —C —hal- 
ogen angle is appreciably greater than tetrahedral. 

Since the completion of this work, a preliminary 


a 
+L. O, Brockway, Rev. Modern Phys. 8, 231 (1936) 





account of a complete structural determinati MN of 
CH,CIl, from microwave spectra has been reporte): 
The structural conclusions reached are the Sane 4 
ours, and the A’’—(B’’+(C’’)/2 obtained from thes 
more accurate data, 0.96097 em=', is in reas: nab}, 
agreement with the results reported here. 


*W. D. Gwinn and R. J. Myers, Symposium on Molecular Structy e, 0 
State University, June (1951). 


Wasuinoton, April 1, 1952. 





Th 
in th 
are st 
chem 
beha’ 
small 
dissol 
The s 
ideal 
matic 
ultra 
rium 
quant 
metri 
condi 

Th 
and p 
which 
donat 
group 
phenc 
study 

If 1 
consts 
case ¢ 
straig 
ionizu 
ionizil 
spectr 
ized, 

This 
istry of t 
in New ¥ 

Figur 


{ Research of the National Bureau of Standards Vol. 49, No. 1, July 1952 Research Paper 2337 


Overlapping Dissociation Constants of 4,4'-Diamino- 
benzophenone from Spectral-Absorbancy Measurements’ 
Elizabeth E. Sager and Iris J. Siewers 


Dissociation constants and related thermodynamic quantities can be calculated directly 
from spectral-absorbancy measurements and known hydrogen-ion concentrations, when the 
compound, whether base or acid, has only one group capable of accepting or donating a proton. 
When two groups are present, one may also calculate the constants of the reaction directly if 
the ratio of the two constants is large, that is, if the pA values are separated by several units. 
If the two reactions closely overlap, the ratio is small, and a complicated series of approxima- 
tions Is necessary. 

An example of the latter type of reaction is the overlapping dissociation of the ions of 4,4’- 
diaminobenzophenone. The spectral-absorbancy curves of the undissociated and of the 
completely dissociated species can be obtained experimentally. The spectral curve repre- 
senting the one-group dissociated species cannot be measured. The bands of the three 
species are superimposed upon one another and, during the overlapping reactions, each of the 
three species contributes to the observed absorbancy values. Using a series of solutions with 
very small differences in hydrogen-ion concentrations, the two constants were first calculated 
from measurements at the very beginning and at the very end of the reactions. Equations 
were developed, using the constants, activity-coefficient terms following the simple Debye- 
Hiickel relationships for the first constant, and the hydrogen-ion concentrations, to determine 
the relative amounts of each species at any stage during the dissociation. The validity of 
the method was proved by its application to all intermediate data where the two reactions 
overlap. The agreement of the calculated sums of absorbancy with the observed values is 
well within experimental error. The method will be useful for many substances where 
electromotive-force methods cannot be applied. 


1. Introduction compound can be obtained experimentally.’ The two 

constants can be calculated from separate series of 
The diphenylketones are important intermediates | @bsorbancy curves and hydrogen-ion concentrations 

in the synthesis of many organic compounds. They | i the usual manner. : 

are so slightly soluble in water, that several physico- A more complicated picture is presented when an 

chemical methods of approach in studying their | amino compound, such as 44 -diaminobenzophenone, 

behavior are precluded. However, in most cases, | Teacts with a strong acid, such as hydrochloric acid 

small quantities of the order of 2 to 20 mg can be | 2 two stages, and the two cations formed by addition 

dissolved in 1 liter of water at room termperatures. | Of hydrogen ion dissociate according to the reaction, 

The solutions are then 107 or 107‘ molar, which are | ®S follows: 

ideal for spectrophotometric measurements. Aro- Sa 

matic compounds of this type show absorption in the |... wn; —— 

ultraviolet range of the spectrum, and their equilib- a 

rium constants and other related thermodynamic james — 

quantities can be calculated from spectropheto- HNC » C—< Nu; + He 

metric measurements made umder carefully controlled ‘ee 

conditions. 

The spectrophotometric method is straightforward | 4.4 7 Bod. Nwny — 
and precise for a substance, whether base or acid, in ie Mild 
which only one group is capable of accepting or _ 
donating a proton. Respenmntative of this single- Hy NC 4 Cm + Hr 
group dissociation is the reaction of 4-aminobenzo- ame 
phenone with hydrochloric acid, a subject of recent For simplicity in writing all equations that follow, 
study by the authors [1}. en ae a the above reactions may be expressed thus: 

If two groups are present, and their dissociation : 
constants are separated by several units, as in the R++—— R++H+ (1) 
case of p-hydroxybenzoic acid, the procedure is still 
straightforward [2]. This acid has a carboxyl group R+ ——R +Ht. (2) 
ionizing between pH 3 and 5, and a hydroxyl group 
ionizing between pH 8 and 10. The three limiting | The constants for the equilibria (1) and (2) are the 
spectral-absorbancy curves representing the un-ion- | first and second acidic dissociation constants, respec- 
ized, one-group ionized, and the completely ionized | tively, for the doubly charged ion. 


his paper was presented before the Section of Physical and Inorganic Chem- * When the carboxyl group is ionized, the band of maximum absorbancy 


istry of the Twelfth International Congress of Pure and Applied Chemistry held occurs at lower wavelengths than that of the un-ionized acid. With ionization of 
in New York City, September 1951. the hydroxyl group, a much greater shift of maximum absorbancy in the oppo- 


* Figures in brackets indicate the literature references at the end of this paper. site direction is found. 








The spectral-absorbancy curves representing R** 
and R can be obtained experimentally, but the curve 
for R* cannot be obtained. Extensive data in which 
very small differences in hydrogen-ion concentration 
are used, which result in very small differences in 
absorbancy, enable one, however, to calculate a 
theoretical curve for R*. Upon close examination 
of the series of curves at the very beginning, and at 
the very end of the reactions, it is evident that several 
curves share common isosbestic points. Assuming 
that in each case only two species are involved in 
one reaction, provisional values of each constant may 
be calculated, using the absorbancy values and known 
hydrogen-ion concentrations at the beginning and at 
the end of the series. It will be shown later that use 
of the simple Debye-Hiickel limiting law to express 
the activity-coefficient terms for the first constant 
is adequate. 


2. Experimental Details 
2.1 Materials 


4.4’-Diaminobenzophenone was obtained from 
Eastman Kodak Co. It was twice recrystallized 
from ethyl alcohol and water. The yellow crystals 
melted at 244.1° to 244.4° C. 

Conductivity water was used to dissolve the com- 
pound and to prepare all solutions. Hydrochloric 
acid of reagent grade was used to make stock solu- 
tions, from which the lower concentrations were 


prepared. 
2.2 Equipment 


A model DU Beckman spectrophotometer was 
modified with a constant-temperature cell compart- 
ment of the authors’ design made in the Instrument 
Shop of the Bureau. The cell assemblies consisted 
of Pyrex cylinders, 38 mm in diameter, with remov- 
able crystalline-quartz end plates, which were held 
together in snotel containers with screw caps, metal 
inner sleeves, and rubber and Bakelite gaskets. 
Temperature within the cell compartment was con- 
trolled to within +0.1° C, and all measurements 
reported in this paper were made at 25° C. 

A commercial glass-electrode assembly was used 
to measure the pH of all solutions as a check on the 
hvdrogen-ion concentrations. 


3. Method of Calculation 


Some of the absorbancy curves calculated from 


spectrophotometric measurements of a series of | 


solutions of 4,4’-diaminobenzophenone at various 
hydrogen-ion concentrations are shown in figure 1. 
Thirty-six solutions were measured throughout the 
ultraviolet, but all curves are not shown in the figure, 
as mechanical difficulties were encountered in re- 
producing them in one drawing. Curve | represents 
the neutral base and curve 36, the doubly charged 
ion. 

The constants for the first and second dissocia- 
tions, as expressed in eq 1 and 2, are calculated as 
follows: 








[R*)[H*] fafa 
[R**] fre 


ary 


_K, pete 
aR Tre 
frfur_ (RIH"| fe far 
[R*] fr 


» _ arane -K 
apr “e fre 


in which a denotes activity and f denotes activity 
coefficient with the appropriate subscripts, concep. 
trations are denoted by brackets, and K,, and K, 
represent the concentration constants for the firs; 
and second reactions. 

The relative amounts of R**, R*, and R at an 
stage during the overlapping dissociations may |, 
calculated from the following equations developed j) 
a manner similar to a method described by Clark [3 
He applied it to an acid of the type HAH dissociating 
stepwise to HA~ and A~~, using electromotive-fore 
measurements. However, the calculation of th 


ratios of R* to R** and of R to R* from spectro- 
photometric measurements is more involved thay 
approximations from emf measurements. 

Let [S] represent the original concentration of thy 
diaminobenzophenone, which does not change dur- 
ing the dissociation, but is the sum of all species 
present at any stage of the reaction. 


Then 


[S]=[R**]+[R*]+[R], 
Let 
[IR**)/[S}, 


[R*)/{S], 
ay [R] {S]. 


The sum of a, a, and az is 1. 

It is reasonable to assume that the combined 
activity-coefficient term, frfu+/fr-, in eq 4 is very 
small, especially in such dilute solutions, and may 
be taken as unity. It thereby cancels out, and 
K, is equal to K,,. The evaluation of the combined 
terms, frefu+/fr, in eq 3 will be considered later, 
and for simplicity will henceforth be represented 

WAVELENGTH, ma 
208 227 250 278 312 357 417 


- | 





MOLAR ABSORBANCY x 1075 











WAVENUMBER, cm™! x 1072 


Ficure 1. Spectral absorbancy curves representing the overlap- 
ping dissociation of the ions of 4,4'-Diaminobenzophenone. 


Curve 1 is for the neutral base and curve 36, the doubly-charged species. 





we h: 


Th 


wave 


wher 
comy 
extin 
(—lo 
b is 

radia 
of th 
Le 
abso 
one-¢ 
respe 
index 
any ¢ 
two s 


as’ The concentration constant, A,,, will change 
wit! change in ionie strength, and is equal to K,/f. 

B. rearrangement of eq 3 and 4, and the above 
lerations, the concentrations of the three species 
iow be expressed as follows: 


cou 
ma 
_(R*)(H*)_(R) (H+? 
Ky f (Ky) f) Ky 
_[R)(H*] (K/f[(R**] 
K, {H+} 


AK, (R*)_ (Ky) A(R), 
[H*) {H*}? 


[R**] 


, 


[R*] 


[R} (11) 


Combining the above relationships with eq 5, 6, 
7 and 8, the following expressions are obtained. 


[R**| 
IR**|(Ky/f) 
(H*] 


[R*] 
+[R*]+ 


>——=> , (12) 
(Ky P)AL(R**} 


k,(R*)’ 
([H*) 


(R*|(H*) 
Ky /f 

[Ri 

(R) ([H*) 
k, 


(R} {H+}? 
(kK, pK.t 


+[R] 


By dividing the right-hand side of eq 12, 13, and 14 
by [R**], [R*], and [R], respectively, and simplifying, 
we have 
a (|? 
~ (H+ P+(4,/ fA H+] 


| (Ky fy") 
ae= (H+)? i (K, AH) + (K, DK, 


_ (K, ie ; 
3 (H+ +(K,/ AH) +( KAR: 


+R, pK; (15) 


(16) 


(17) 


The law of absorption states that, at any given 
wavelength, 


dy A/bM, (18) 


where @y is the molar absorbancy index of a pure 
compound or species (frequently called molecular 
extinetion coefficient, €), A is the specific absorbancy 
(—logy) transmittancy) of the compound in solution, 
6 is the depth in centimeters through which the 
radiant energy passes, and M is the concentration 
of the absorbing compound in moles per liter. 

Let ypca++), Qaecr*), ANd Gyn) represent the molar- 
absorbaney index of the doubly charged ion, the 
one-group dissociated species, and the neutral base, 
respectively. Let @yycons) be the molar-absorbancy 
index at any known hydrogen-ion concentration at 
any observed stage during the dissociations. If only 
two species are present, the ratio of their concentra- 


207064—52— -2 





| on the Y-axis should be dyy;x+). 





tions may be calculated at any given wavelength if 
the molar absorbancy index of each species is known 
at that wavelength, and if the observed absorbancy 
is the sum of the absorbancies of each component 
species. As dy-n+ is unknown, a means must be 
found t6 determine it before the ratio of [R*| to [R**], 
or of [R] to [R*] can be calculated. Close examina- 
tion of the absorbancy curves for the 36 solutions 
shows that practically identical isosbestic points are 
obtained for solutions 1 to 11, inclusive, and for 29 
to 36, inclusive. This is demonstrated in figure 2. 
It may therefore be assumed that in each case only 
two species are involved, thus the second constant 
may be calculated from the absorbancy data of solu- 
tions | to 11, and the first constant may be calculated 
from the data of solutions 29 to 36. Inasmuch as 
activity-coefficient terms must be considered in deter- 
mining Aj, it is more convenient to first calculate Kg. 

According to the law of absorption, the concentra- 
tion of a particular species is proportional to its 
molar-absorbancy index. It follows, then, that if 


| only R* and R are present, the ratio, [R]/[R*], can 


be calculated as in eq 19: 


(R| 
[R*] 


vt (obs Ayr 


(19) 
Ayr Ov (obs 


Equation 19 combined with eq 3 results in the 


following: 
K, (H , @4 (obs) — Err*) 


Aver) 


(20) 
M1 (obs 


As stated previously, @y,,.) cannot be obtained ex- 
perimentally. It can be determined, however, by an 
extrapolation procedure similar to that used by Weil 
and Morris [4]. If dyin.) is plotted on the Y-axis 
and |d@y.R)—4@arone)|/[H*] on the Y-axis, the intercept 
The calculation of 
values for @y,y+, may also be made at any wave- 
length, by simply equating the absorbancy relation- 





x 
a 
Gs) 


b 
& 
" we OO a05 = 


MOLAR ABSORBANCY 
wu 
o 








| 1 
330 380 370 360 340 
WAVENUMBER, cm'x to? 





Fievure 2. Absorbancy values showing isosbestic points for 
solutions 1 to 11, and for solutions 29 to 36. 


Dotted lines represent values for solutions 12 to 28, inclusive, the intermediate 
data where the two dissociations overlap 





WAVELENGTH, Mma 
200 





19075 


MOLAR ABSORBANCY x 








Ficure 3. Molar abserbancy values obtained experimentally 
for the neutral base, R, and for the doubly-charged species, 
R**, and calculated values for the singly-charged species, R*. 


ships and hydrogen-ion concentrations of any pair 
of curves, using solutions 1 to 11, inclusive. From 
values at many wavelengths a molar-absorbancy 
curve for the species R* is obtained, as shown in 
figure 3. The curves for R** and R are also given 
in the figure for comparison. 

The average of several calculated values of dy ;,+ 
at wavelength 338 (the wavelength of maximum 
absorption for the neutral base) is 15600, whereas 
from experimental observations, @y;,) is 23600. 


TaRLe 1. 


| gives the second dissociation constant. 





Before K, can be calculated, the relative amoun: of 
R* at each hydrogen-ion concentration is determi \ed 
(see table 1). This amount multiplied by the molar 
concentration of the base used, namely, 2.506 Lin’. 
in this case, gives an equivalent to be subtracted fiom 
the hydrogen-ion concentration of the hydrochluric 
acid, thus giving a corrected hydrogen-ion concen ra- 
tion. This figure, multiplied by the ratio [R]/[k* 
It is readily 
seen that the values in the last column are practically 
constant, and therefore the assumption that the 
activity-coefficient term fafy+/fe: is Unity appears to 
be valid. At 25° C, Ky, is 0.0012 mole per liter, in 
round numbers, and —log A), or pK», is then 2.92. 

The concentration constants for the first dis. 
sociation may now be calculated, using the absorb- 
ancy values and hydrogen-ion concentrations of 
solutions 28 to 36, inclusive. The isosbestic points 
shared by these solutions indicate that only two 
species are involved, namely, [R*] and [R**]. The 
ratios [R*}/[R**] may be calculated according to 
eq 21, in which, 


[R*] __ Um (obs) — Ea ir**) | (2) 
[R**} - 


Om cr*) — €4 (obs) 


The results are shown in table 2. In this case 


the hydrogen-ion concentration of the hydrochloric 


‘ Although the spectral-absorbancy curve for solution 28 did not pass throug? 
the isosbestic point shared by solutions 29 to 36, it was used to caleulate the 
kK, values. 


Calcuiation of Kee from absorbancies at wavelength 338 and known hydrogen-ion concentrations 


is the 


K,.f; 


When 
are p 
which 
is rea 


Deby 


@ (obs) = 15600[R*)+23600[R)};[R*}+[R]=1 


Hydrochloric 
aci (H+) [RYIR*) in wh 
concentration at 25 
oe might 


same 


Solution | ax (ove) [R*} 


0. 0000 
O77! 10. 03 X 10-° 
1175 16. 05 
450 20. 06 19. 70 
. 1813 26. 18 25. 73 


Mx” 0. 00117 
15. 76 5 - 00118 

: - OO116 
. 00116 


.Oo119 
.OO117 
00117 
. 00118 
. 00118 


. 2150 33. 10 
. 2612 42.13 
. 2800 46.14 
. 2088 51.15 
21,010 2: 3238 57.17 


20, 740 2, 86 . 3575 66. 20 00117 


Calculation of Ki. from absorbancies at wavelength 338 and known hydrogen-ion concentrations 


@M (obs) = 460[R**}+15600[R*);[R**}+[R*]=1 


TABLE 2. 


Hydrochloric 
acid concen- {H*) 
tration 


GM (ob) 
au: R* 
Bera: | Sacat) | (R*) 


[R*)/[R**] 


.-] 
& 


Solution 





6,640 0.408 | 4,082x10- 0. 689 
5, 900 ) 350 | 4,840 560 
5, 160 . | 380 | | 5,756 449 
4,570 | 5 | 771 | 6,756 . 372 
4,020 . 235 ‘ | 7, 835, 307 


4, 078X105 
4, 836 


28a 2288 


3, 610 1 208 | | 8 263 
3, 190 |. 180 | 10, | 10, 20 
2, 900 a | i, . 192 


es 








acid ~ corrected by the amount necessary to con- 
met *to R* and R* to R. The ratio [R*)/[R**] 
is iplied by the hydrogen-ion concentration to 
vive ve concentration constant, K,,. It is readily 
seen at as the ionic strength increases, the con- 
eenti ion constants decrease in an orderly manner. 
The alues of —log K,,, or pK,,, are also given in 
the table. 

pA.. is now plotted as a function of ionic strength, 
as shown in figure 4, to determine whether an extra- 
polation to infinite dilution can be made. Inasmuch 
as there are no values for the low ionic strengths, 
it is unpossible to make such an extrapolation without 
some consideration of the activity-coefficient terms, 
baby As a first estimate of the terms, the 
Debve-Hiickel law may be applied. It becomes in 
this case 


f pr 


—log f= —2Ayu, 


where A is the constant 0.5092 at 25° C [5], and yu 
is the ionie strength. The constant A, is equal to 
K,.f; therefore, 

pk, = pki. 1.018, BM. (23) 
When the values for the different ionic strengths 
are plotted, they lie on a practically straight line, 
which may be extrapolated to infinite dilution. It 
is reasonable to assume that use of the extended 
Debye equation 


2Ayu 


=) (24) 
1 —Ba*, M 


—log f= poe 


in which B is the constant with a value of 0.3281 
at 25° C [5] and a* is an adjustable parameter, 
might give a better extrapolation. However, the 
same value for pA, is found when 1, 2, or 4 is assigned 








(-log f =-2A vB) 








i J i | 
o2 04 06 08 
IONIC STRENGTH 


Ficure 4. pK,. plotted as a function of ionic strength, and 
extrapolation to pK, using the Debye-Hiickel equation. 


empirically to a*. pA, is 1.367 at 25° C, and A, is 
therefore 0.043 mole per liter. 

The validity of the calculated constants and 
activity-coefficient terms may next be subjected to 
test in the intermediate data where the dissociations 
overlap, and where both constants are involved. 
A first approximation of the relative amounts of the 
three species R**, R*, and R, is given in table 3. 
The original concentration of hydrochloric acid is 
used as the ionic strength, and f is calculated accord- 
ing to eq 22, for each solution. The values K,/f, 
[H*?, [H*)(AG/f, and [H*](A,/fAy are then cal- 
culated and are given in columns 3, 4, 5, and 6. 
The amounts of the three species are calculated 
according to eq 15, 16, and 17, and are shown in 
columns 8, 9, and 10 as a, a, and as. 

A second approximation is given in table 4. The 
ionic strength now includes all ions and is very 
slightly changed. A recalculation of f is made for 





TABLE 3. First approximation of relative amounts of R+*+, R*+, and R from intermediate data 


Original con- 

centration of 

hydrochloric 
acid 


Solution H+? 


51. 2X105 0. 04077 

57.2 | . 04066 2. 326 
66, 2 : 044 2. 680 
100.3 . OF . 101 4.004 
130 ° . 169 5. 136 


0. 026X 10-5 
. 033 


161 ° > }. 301 
200 y 

256 

331 

406 


(HU) 


2. 087 X 10-5 


(H+)UKif) Ke Sum of col- 


} a a 
} | umns 4, 5, 6 [R**}/[8] (R*)/{S) 


4. 892 10-* 
4.879 
S50 
. 790 
741 


eu 


4 

4 

4. 
4 

4 

4! 
4.510 
4.444 
4 
4. 
4 
4 
3. 
3. 
3. 
3. 
3. 


364 
277 
182 
092 
SY6 
S38 
704 
‘47 
385 





*/ is provisionally calculated using the concentration of HC] as the ionic strength. 
* Hydrogen-ion concentration is provisionally calculated from the concentration of HC]. 





Taste 4. Second approrimation of relatwe amounts of R**, R*, and R, and calcuiated molar absorbancies compared ; 
served values 


Caleu- 

+ 1,600, Be lated 
au ay sum 

au 


Sum of 
columns 
41,5, 6 


Solu 
tion 


13 


055 x 10 sv2 x 10 v2xKI0" «60 70 q i, 570 
223 STS we : 77 % 15, § 

643 SAT 7. 43 5, 15, 3 

we ™ S44 M3 ’ 12, 7% 
5. O87 Tal OO4 Il, 


9,180 


él ‘ pest nuy 173 z SAT . ‘ v4 &, 610 
a “ 7. 670 Hs 2.7 : d ¥, q , 080 
2h : 5 9. 60s Sel 4 : 5 7 : 4 f 7, 400 
31 wi 5s : 5 : , 2 5,4 . TOO 
Hey wea 45 r ‘ i .4 i, 110 


510 5 176 7] 25. 4: 727 72| 58 uf 08 5,450 15 li 
ot 7 22 sis Pre} a, y 7 7 n 3.2 600 , 
so : 27. 845 Isz A 7 . 2, 57 3,980 13.95 mal 
1000 , Os! 7.92 aT 7 5 . 2. 3,150 - 
144i ‘ ; 71.17 55 55 ‘ " "650 rate 
espt 
1, 25 5 +. S35 ; 5 ‘ : . 180 9 
2, 002 ‘ ; ; 7 5 5 7 9, , 40 ' Ww 
> 5 * > me te . - ™ hl 
ett — = ! " =| |. so | Som) at rhe 
add 
regi 
the 
intr 


H*}+(Ch-]+[R*]+4{R**})/2 f=faetuti in log f 2A yu LOIS yp © K, =0.043; Ky =0.0012 


each stage in the overlapping dissociation, and the | tion of the solutions, the fourth significant figur 
values are given in column 3. Slightly different | should probably not be reported. However, if the Lou 
values for a, a, and a, are now obtained, as shown | values are rounded off to the third figure, there woul Pet 
in columns 8, 9, and 10. The absorbaney contrib- | be perfect agreement in 8 of the 19 cases. Becaus: the 
uted by each species may now be calculated, by | the agreement is well within experimental error, th: gaps 
multiplying the relative amounts of each species | validity of the constants is substantiated, and th desi 
by its respective molar-absorbancy index. These | relationships employed throughout are essentially in tl 
values are given in columns 11, 12, and 13. The | correct. 
sum of the absorbancies of the three species should red | 
be the total absorbancy, which should agree with 4. References heed 


the observed value. For example, at wavelength = _ A hav 
338: : {1] Elizabeth E. Sager and Iris J. Siewers, J. Research NBs mos 
—s 45, 489 (1950) RP2162. imp 
gs J {2] Elizabeth FE. Sager, Marjorie R. Schooley, Alice S. Car Leos 

Oy (ova) = 460 a, + 15600 a, + 2360045. (25) and 8. F. Acree, J. Research NBS 35, 521 (1945) RP. on 
1686. mea 

The agreement of the calculated sums of absorban- | [! W- ow Clark. PP ay a of hydrogen ions less 
4 4 - oa ‘ . - ». 26. (William & Wilkins Co., 1928). noe 
cies with the observed values is satisfactory. When [4) I. Weil and J. C. Mente J. Am. Chem. Soc. 71, 3123 a 
one considers that the molar absorbancies are caleu- (1949.) ath 
lated from observed transmittancy or absorbancy | [5] G. G. Manov, R. G. Bates, W. J. Hamer, and 8. F. Acre sans 
readings, which can easily be in error +1 in the third J. Am. Chem. Soc. 68, 1765 (1943). “4 
figure from error in setting the dials and reading the paler 


scales, aside from any additional errors in prepara- | Wasuineron, February 1, 1952. hith. 
Noh 


leng 
relie 
TI 
were 
trom 
grat 
mirr 
phot 
pass} 
more 
trom 
will 





RK 
(142 
? Ear 





rire 
the 
puld 
ALISe 
the 
the 
ally 


f Research of the National Bureau of Standards 


Vol. 49, No. 1, July 1952 Research Paper 2338 


Calibrating Wavelengths in the Region From 
0.6 to 2.6 Microns 


Nicolo Acquista and Earle K. Plyler 


The wavelengths of twenty absorption bands have been measured on a grating spec- 
trometer in the region from 0.6 to 2.6 microns. 


been measured in the near infrared region. 


Also, several emission lines of krypton have 


The purpose of this investigation was to make 


available additional calibration points for prism instruments. 

The bands which were selected for calibration are parts of the absorption spectra of 
didymium glass, carbon disulfide, 1,2,4-trichlorobenzene, carbon disulfide, and polystyrene. 
Five graphs of the measured spectra are included, and the calibrated wavelengths are 


marked on the band 


In order to calibrate prism instruments adequately, 
many absorption bands and emission lines of accu- 
rately known wavelengths are needed. 
especially so in the wavelength region between 0.6 
to 2.6 «, Where the wave number span is 16,000 em™', 
The purpose of this investigation was to provide 
additional calibration points in the near infrared 
region for use with prism instruments. Previously, 
the problem of obtaining calibration points in the 
infrared region has been studied by Oetjen, Chao- 
Lou Kao, and Randall.!. More recently, Plyler and 
Peters? have added calibration points covering 
the region from 2 to 25 uw. There are still several 
gaps Where further calibration points would be highly 
desirable, in addition to the combined results given 
in the articles cited in footnotes 1 and 2. 

The calibration of other points in the near infra- 
red region from the visible to 2.5 uw has been neglected 
because no fundamental vibrational absorption bands 
have wavelengths less than 2.5 uw. For this reason, 
most observations commence at 2 u. Recently, the 
importance of combination and harmonic bands has 
been recognized in the analysis of spectra, and 
measurements are now being made at wavelengths 
than 2 w. More accurate calibrating wave- 
lengths are necessary in order to fully utilize the 
high sensitivity of instruments which are equipped 
with lead sulfide (PbS) cells. The lead sulfide cell 
has been found to be very sensitive for the detection 
of radiant energy from 0.6 to 2.6 u and thus allows a 
much greater accuracy and higher resolution than 
hitherto possible for prism instruments. The wave- 
lengths listed in the present study should assist in 
relieving this situation. 

The absorption bands selected for calibration 
were measured on a high-dispersion grating spec- 
trometer. Thespectrometerhasa 15,000-line-per-inch 
grating as the ene unit, and the collimating 
mirror has a focal length of 102 em. A lead-sulfide 
photoconducting cell detects the radiant energy 
passing through the exit slit of the spectrometer. A 
more complete description of the grating spec- 
trometer has been given in a previous paper and 
will not be repeated here.’ 


less 


a. A. Oetjen, Chao-Lou Kao, and H. M. Randall, Rev. Sci. Instr. 12, 515 


* Earle K. Plyler and C. Wilbur Peters, J. Research N BS 45, 492 (1950) R P2159. 
* Earle K. Plyler and Norman Gailar, J. Research NBS 47, 248 (1951) RP2249. 


r™: . 
This is 





13 


in order to facilitate identification. 


The calibration of the spectromecer has been made 
by the use of higher orders of standard atomic lines 
that appear as first order lines in the visible and near 
infrared spectrum. The emission lines from mer- 
cury, krypton, argon, neon, and xenon were observed 
from high current ares, and the position of the grating 
was indicated by a counter when the lines were de- 
tected. This made ic possible to obtain a wave- 
length scale as a function of the counter readings. 
By the use of the calibration curve for the grating 
spectrometer, it was found that the wavelengths of 
spectral lines can be measured to an accuracy of 
+0.0001 ». When absorption bands are measured, 
the wavelengths can usually be determined to an 
accuracy of +0.001 wu. The grating spectrometer has 
a high dispersion and the bands are wide, so that 
there is an uncertainty in locating the position of 
maximum absorption. This reduces the accuracy 
of the wavelengths to the above mentioned value. 
In order to have a check on the wavelength calibra- 
tion, two sources of radiation are usually recorded 
simultaneously. One source is a tungsten-ribbon- 
filament lamp, which is operated with a current of 
35 amperes and serves as the radiator of the contin- 
uous energy. The other source is a mercury or a 
krypton arc. By means of mirrors, both sources are 
focussed on the entrance slit of the spectrometer. 

The experimental results are represented in figures 
1 to 5, and show the spectra of didymium glass, 
polystyrene, 1,2,4-trichlorobenzene, and the emission 
spectrum of krypton. The didymium glass was 
selected on the basis of its successful use for calibra- 
tions in the visible region, where tests over a number 
of years have shown that the bands do not change 
in position with different batches. The polystyrene 
and 1,2,4-trichlorobenzene were selected because they 
are now in use in the longer wavelength region; thus 
the total number of materials needed for calibration 
would not be increased. One band of carbon disul- 
fide (CS,) with a wavelength of 2.224 » has been 
calibrated. It is well separated from the other bands 
of CS, and no figure is given as there should be little 
difficulty in its identification. 

Figure 1 is a trace of the spectrum of didymium 
glass as measured on a model 21 Perkin-Elmer spec- 
trometer containing a NaCl prism. The wavelengths 
measured by the use of the grating spectrometer are 
marked on the figure, and are also listed in table 1. 





TasBLe 1. Calibration wavelengths for prism instruments (see footnote 2) the bands with wavelengths 2.494 
and 2.5434 were not labeled; two other bands wy, 
Wave a, incorrectly labeled with these wavelengths. A glay 
Wavelength | number ness Substance prism was used in the spectrometer for the megs. 
urement of the emission spectrum of krypton, t), 
, a results being represented by figure 5. The wid), 
fea 0.0m | 14616 | Solid $ | Didymium glass of some of the observed lines indicated that sever 
743) 102) ABASS do Do components might be present. Further measyrs. 
> = oe > ~ ments with the grating spectrometer proved that al 
ars ae ae the lines measured by the prism instrument hai 
oor =| 002 rr do Do several components. 
— ‘axl aa o > All of the calibrating wavelengths determined jy 
‘1.002 7 Film Polystyrene the present study are listed in tables 1 and 2. 
v8 = 008 5212 | Solid } - Didymium glass The position of the maximum of absorption of thos, 
= | a = |= i ~ am bands, which are not symmetrical or which have twy 


170 on 07 do Do . 
ist. OO 4571 do or more components, is usually changed when meas. 


Do 
24 . 002 4495 Liquid 5 Carbon disulfide 


DEFLECTION ——+ 


v 
Neh he 





HH06 o002 020.3 do 5 1,2,4-Trichlorobenzene 
1526 ooo2 44.2 do 5 Do 
316 002 4322.9 ...do 5 Do 
ao ooo 4100.3 do 5 Do 
474 . 002 4101.6 ..do 5 Do 


4 oo2 4009 do 5 Do 
Ms oo2 3u32 do 5 Do 


mn NNR 





* Average value of the maxima for 0.740 and 0.748 a. 
* Principal band observed with prism spectrometer. 


Figure 2 is a trace of the record obtained for the two 
bands at 0.743 and 0.808 uw under high resolution. 
The emission lines are produced by a mercury are. 
The wavelengths listed on the graph for chese lines 
are given as twice the standard values since they are 
recorded in the second order. Figure 3 represents 
part of the spectrum of polystyrene and shows the 
regions used for calibration. The 1.681l-u4 band is 
composed of one component, but the 2.170-u band 
has two side branches that are hardly noticeable on 
the prism instrument. Figure 4 represents the spec- 
trum of 1,2,4-trichlorobenzene from 1.6606 to 2.543 WAVELENGTH — 

pw. The wavelengths are marked on the bands which Ficure 1. The adsorption spectrum of didymium glass und 
have been calibrated. In a previous publication prism dispersion from 0.6 to 2.0 uw; t=0.6 mm. 


DEFLECTION ——~ 

















DEFLECTION 





08093» 2-ORDER Hg 


~O.8IS6y 2-ORDER Hg 


DEFLECTION—~ 











*— WAVELENGTH 
Ficure 2. The absorption spectrum of didymium glass with mercury lines superimposed as obtained with a grating instrumen! 
for the bands at 0.743 and 0.808 yw; t=6 mm. 


14 





2.494 
Wey 
glass 
Neas. 
ty the 
Width 
Vera! 
tSure- 
at al) 
| had 


ed in 


thos; 
eC two 
Neas. 


ee 





OEFLECTION——~+> 











WAVELENGTH ——> 


Ficure 3. The infrared absorption bands of polystyrene at 
1.681 and 2.170 pw obtained with a prism instrument; t=0.6 


mim, 








2 
L 
fo} 
° 
. 


—— 2.4030u 
——24374y 
— 2494p 
— 2543u 


me 


DEFLECTION 











WAVELENGTH ———> 
Ficurge 4. Near infrared spectrum of 1,2,4-trichlorobenzene 
observed with a lithium fluoride prism; 0.5 mm cell. 


ured with different spectral slit widths. When nar- 
row slits corresponding to a spectral interval of 1 to 
3 A are used, as with the grating spectrometer, the 


true band shapes are observed and each component is | 
When spectral slit widths of 100 A or | 
greater are used, as with prism instruments, a change | 
in wavelength of the maximum of absorption of some | 


resolved. 


bands may occur. This effect has been observed by 
K. S. Gibson,‘ who found that the wavelength of 
maximum absorption of the 0.743-u band of didy- 
mium glass oceurred at 0.745 «w when the spectral 
slit was changed from 100 to 200 A. 





OEFLECTION——»> 


1.226y 
.320p 








= 











WAVELENGTH ——*> 


Ficure 5. Infrared emission lines of krypton obtained with a 
Perkin-Elmer spectrometer with a glass prism. 


listed in table 1, have been compared with the meas- 
urements of Gibson.* Of the six bands, which were 
measured independently, the wavelength value of 
only one differs by more than +0.001 yu. This is 
within the probable error of measurement. In this 
study the wavelength of one band has been found to 
be 0.880 uw, whereas Gibson reports a value of 0.883 yu. 


Calin ated krypton lines for prism instruments 


TABLE 2. 


Wave- 

length 

grating 
value 


Wave 
num ber 
(vac 


Wave Wave- 
number length grat- 
(vae) ing value 


Wavelength 


Wavelength 
prism value 


prism value 


oe 

1. 278 +0. 002 
1.475 +0. 005 
1.321 +0. 001 40618 
42300 
53260 
a2 . 53353 
1.533 +0. 005 re = 19 
54335 
A739 


1.374 +0. 001 


1. 392 +0. 003 


. 67205 
O7M47 
AiN535 

. OSE 
‘So 
60357 


1.442 +0. 003 1.685 +0. 005 





RIATI 


1.817 +0. 001 SIS44 


The calibration wavelengths of the bands of didy- 


mium glass in the region of 0.68 to 0.88 u, which are | 


‘4K. 8. Gibson, NBS Circular 484 (1949). 


15 





In table 2 are given the wavelengths of the krypton 
emission lines as measured by a prism instrument 
and by the grating spectrometer. The grating values 
were furnished by C. J. Humphreys ° of this Bureau. 
These lines ace not well suited for the calibration of 
prism instruments with a thermocouple or bolometer 


*C,. J. Humphreys, private communication 





as the detector, but a psism instrument with a Ps 
detector would have sufficient resolution to sepa rat, 
many of the component lines of the krypton ¢ 5¢¢. 
trum. These lines would be useful for calibration 
because of their high accuracy, which is about e ;y,) 
to 1 part in a hundred thousand. 


Wasuineoron, April 2, 1952. 





q 
ma 

bas 
pot 
are 
une 

tes 

ma 
ciel 
car 
skil 
the 
ma 
the 
q 
Ins! 
lar; 
Fe 
tra 

acr 
ma 
the 
res 
sou 
me 
fiel 
smi 
par 
diff 
the 
: 
ma 
coe 
sta 

up 

end 
tes! 
mil 
con 


*)) 


Phs { Research of the National Bureau of Standards Vol. 49, No. 1, July 1952 Research Paper 2339 
Tale 
ee Long-Tube Method for Field Determination of 


“ual 


Sound-Absorption Coefficients 


Earle Jones, Seymour Edelman, and Albert London* 


A method has been developed that makes it possible .o measure the sound-absorption 
coefficient of acoustic materials that have been installed. The method uses a portable 
version of the familiar impedance tube, in which measurements on the standing sound-wave 
pattern are used to obtain the sound-absorption coefficient for normally incident sound. The 
sound-absorption coefficient for randomly incident sound may be determined from the tube 
measurements by reference to an extensive table given in the paper. The tube is coupled 
to the acoustic material without defacing its surface so that the test is nondestructive. It 
is useful for acceptance testing; the determination of the effects of aging, staining, and 
redecoration of acoustic materials; and for quality-control purposes in the manufacture of 
acoustic materials. The results of laboratory measurements on a large number of acoustic 
tiles and plasters were compared with the results obtained with the standard reverberation- 
room techniques. Field measurements were made on acoustic plasters. Large variations 
in the absorption coefficient were observed and ascribed to faulty application and painting 


procedures. 


1. Introduction 


The question of how well! installations of acoustic 


' 
| 


materials comply with specifications written on the | 
basis of laboratory tests is one of considerable im- | 


portance. 
a constructed with care under controlled conditions 
and usually yield repeatable results, even when 
tested by different laboratories. However, when the 
materials are installed the sound-absorption coeffi- 
cients depend to a large degree on such factors as the 
care devoted to the mixing of the ingredients, the 
skill of the plasterer, and the conditions under which 
the plaster is cured. Any one of these factors may 
materially affect the sound-absorbing properties of 
the installation. 

The problem of maintenance of acoustic-material 
installations is also of great economic importance to 
large-scale users of acoustic materials, like the 
Federal Government. The General Services Adminis- 
tration and the Veteran’s Administration have many 
acres of acoustic material to care for. When this 
material becomes soiled or otherwise unsightly with 
the passage of time, the choice of a poor method to 
restore its appearance may result in the loss of its 
sound absorption. In experimenting with various 
methods of redecoration, it is advantageous to make 
field measurements of the sound absorption of a 
small portion of the material before and after using a 
particular method of redecoration because it is 
difficult to reproduce aging and staining conditions in 
the laboratory. 

The “long-tube method” described in this paper 
makes it possible to measure the sound-absorption 
coefficient of acoustic materials that have been in- 
stalled without defacing them. It consists of setting 
up a standing sound-wave pattern within a tube, one 
end of which is closed by the acoustic material being 
tested, and, from measurements of the maximum and 
minimum pressures in the standing wave pattern, 
computing the sound-absorption coefficient for nor- 


“Deceased, 


Laboratory samples of acoustic plasters | 





17 


Measurements were made at 512 cycles per second only. 


mal incidence. A calculation based on semiempirical 
considerations is then used to determine, from the 
normal incidence absorp.ion coefficient, the random 
incidence absorption coefficient that corresponds to 
the value that would be determined by a reverbera- 
tion-room test. An explanation of this calculation is 
given in the paper by London.! 


2. Equipment 


In designing the tube, 512 ¢/s was chosen as the 
operating frequency because this frequency is used in 
reverberation testing, and it is near the mean of the 
frequency range used in sound-absorption tests. 

It was desired to have the diameter of the tube as 
large as practicable in order to make measurements 
on as large a sample as possible and in order to reduce 
the effects of the attenuation due to the walls of the 
tube. The maximum diameter allowable was taken 
as just less than the cutoff diametec for the first 
transverse mode. It was desired to make the tube 
long enough so that at least two minima and one 
maximum could be found but not too long for easy 
handling. In view of these considerations, and 
limited by available material, it was decided that the 
tube should be made of ‘{-in. brass tubing, 9 3/4 in. in 
diameter, and about 30-in. long. The sound source is 
a 6-in. permanent-magnet speaker mounted ona 1/8-in. 
brass plate covering the back of the tube. Several 
types of microphones were tried. The most successful 
was a hearing-aid-type crystal microphone. This is 
mounted on the center bar of a metal H-shaped mem- 
ber. The two parallel bars of the H-member are 
parallel to the axis of the tube and bear brass wheels 
that roll on tracks along the tube. Springs press the 
wheels against the tracks with sufficient pressure to 
hold the microphone in position if the tube is rotated 
from horizontal to vertical. The microphone is 
moved along the tube by means of a steel pipe 
fastened to the H-member and passing through a 


1 Determination of reverberant sound absorption coefficients from acoustic im- 
pedance measurements, J. Acous. Soc, Am. 22, 263 (1950). 





close-fitting bearing in the back cover of the tube. A 
fine adjustment of the position can be made by 
fastening the steel pipe by means of a setserew in the 
back bearing and moving the H-member by a steel 
rod threaded through the pipe. 

The tube is pivoted so that it can be used at any 
angle between vertical and horizontal. The pivot 
supports are on a jackscrew so that the tube can be 
moved vertically about 1 ft. The entire mount 
is on casters for easy movement from place to place. 

A bakelite housing cemented over the magnetic 
pot of the loudspeaker shields the back of the tube 
from extraneous sound. After much experimen- 
tation, it was found that a caulking-compound 
gasket would bond the tube to the material being 
tested without too much leakage of sound and with- 
out damaging the material. 

Figure | is a cireuit diagram of the setup. The 
signal from an oscillator drives the loud-speaker. 
The microphone signal is amplified, filtered to remove 
the effects of extraneous sound and harmonics, 
and read on a sensitive vacuum-tube volimeter 
having an accurate decibel scale. Measurements 
on successive minima in the tube indicate that damp- 
ing at the tube wall is negligible because of the 
large diameter. 


3. Theory 


The expression for the normal incidence absorp- 
tion coefficient in terms of the maximum and mini- 
mum pressures in the tube is? 


4 Pune!’ om 


. 1 
iF mann T Protol’ \ 


a=1-[>p max” | 
+P. min 
This expression can be written in terms of r, the 
ratio between P4, and Pryyp, as 


~ 
‘[1+r} 


a 


ACOUSTIC 
MATERIAL 











sionwar TRAVELING 
MICROPHONE 


GENERATOR LOUDSPEAKER 






































(x 


ee. 
J 


ELECTRONIC 
VOLTMETER 
































Figure 1. Block diagram of tube and associated equipment as 
used in field measurements of sound-absorption coefficient 


*H. O, Taylor, Phys. Rev. [2] 2, 270 (1913). 





In terms of A, the difference in db between the | nay. 
imum and minimum pressures, the expression js 


, A 
a = sech?(A ), A =40. M= loge 


The relationship between ap and a%, the absorption 
coefficient for random incidence, is given by Londoy 
as 


° 1—(1— ap 91/9 
“ «(ae In2 ” 


? (l— )'!? Pp 
In{ 1 —(1 —ay)'/?} — . | 4 


In terms of A, a* can be written as 


in 1 —tanh A F 
= 0.1931— 
enie E +tanh A amet 


Inf 1—tanh A 


tank >} 6 


From this relationship, table 1 was prepared 
giving the value of a*, the reverberant sound- 
absorption coefficient, for values of Adb over thy 
range of Adb usually found in testing acousti 
materials, 


Absorption coefficient for reverberant sound, a,* 
in terms of ratio of Pua to Pmin, Add 


Taste 1. 


0%) O82 085 O86 O86 
77 71 
4s me 42 
24 P » 
10 09 
ow 03 


Suppose observed difference in sound level, Adb, between P,,. 
and Pmis is 14 db. Table gives a,” as 0.77. 


4. Laboratory Check of Method 


Simulated field measurements were made on 
samples of some 50 acoustic materials previous| 
tested in the Bureau's reverberation chamber 
Three different specimens of each material were 
tested, and each specimen was tested three different 
times. The average difference between the results 
of the tube measurement and the reverberation- 
room results was found to be —0.07. It was assumed 
that the tube method gave results too low by ths 
amount, and +0.07 was applied as a correction to 
each coefficient measured by the tube. After this 
correction had been applied, the deviations of th: 
tube measurements from the reverberation-chamber 
measurements ranged from +0.18 to —0.21. The 
average of the absolute values of these deviations 
was 0.05, and 70 percent of all of the deviations fel 
between +0.05 and —0.05. The dispersion patter 
of the deviations of the tube-derived a* coefficien' 
from ae, for these materials is presented in figure 2 


Erample: 





IN ABSORPTION COEFFICIENTS 


OW FERENCE 


ition 
idon 


are’ 
und- 
~ the 
usty 


> on 
yusly 
nber 
were 
erent 
sults 
tion- 
med 
this 
nm to 
this 
f the 
mber 
The 
tions 
s fell 
ttern 
cient 
re 2 


he samples used for the simulated field measure- 
is included many types of prefabricated acoustic 
erials. No correlation could be found between 
smount of the deviation and the type of surface 
he acoustic tile or between the amount of the 
ition and the thickness of the tile. If the 
ples used for the simulated field tests are a fair 
representation of all acoustic materials, it appears 
thet the coefficient found by means of the tube 
method will be within +0.05 of the coefficient found 
in the reverberation chamber about 70 percent of the 
time and within +0.10 about 80 percent of the time. 
\s new materials are tested in the reverberation 
chamber, simulated field tests are made on the same 
samples so that the correction to be applied in the 
field is being revised and kept up to date. 


5. Field Procedure 


The field procedure used in determining the sound- 
absorption coefficient by means of the long-tube 
method is made clear in figure 3, which shows the 
equipment in use, and also in the following descrip- 
tion of a test made on a ceiling. When mounted on 
the casters, the top of the tube in the vertical position 
can be set at any height from about 5 ft. 2 in. to 
5ft.10in. It is therefore necessary to use a scaffold 
to raise the tube high enough to enable the top of the 
tube to reach the ceiling. The tube is mounted on 
the seaffold and the jackscrew run up until the top 
of the tube just touches the ceiling, care being taken 
to see that the end of the tube touches the ceiling 
evenly. The caulking-compound gasket is then 
pressed against the ceiling to make a firm seal, as 
shown in figure 3. The oscillator is then tuned to 
512 c/s by comparison with a tuning fork. The 
microphone is moved along the tube by the steel 
pipe until the reading on the vacuum-tube voltmeter 
indicates that it is in the neighborhood of a point of 
minimum pressure. The pipe is then clamped, and 
the microphone is moved by the threaded rod until 
the exact pressure minimum is found. The reading 





+020 


+ 
° 


Sr 
| 
| 


+O 10F 


COEFFICIENTS 








+ 
° 
o 
a 





°o 





ABSORPTION 
°o 
° 
Ge 


IN 
° 
6 


OW FERENCE 








4 4 al = 4 A. 
o4 os o6 o7 os os 
ABSORPTION COEFFICIENT IN REV ROOM @ REV. ) 





Ficure 2. Plot of difference in sound-absorption coefficients 
measured by the long-tube method from the sound-absorption 
coefficients measured by the reverberation-room method for 


fifty different materials at 512 c/s. 


| 
' 
| 
| 


| 


19 


of the voltmeter in decibels is recorded. The micro- 
phone is then moved in the same way to a point of 
pressure maximum, and the reading of the voltmeter 
is recorded. The difference in the two voltmeter 
readings is Adb. The absorption coefficient corre- 
sponding to Adb is then found from the table and is 
corrected by adding 0.07. As described in section 
4, the addition of 0.07 to the coefficients found by 
the tube method makes them agree, on the average, 
with values obtained in the reverberation room. 
This procedure is repeated at a large number of 
points over the ceiling, and the average of the values 
found is taken as the absorption coefficient of the 
material at 512 ¢/s. 

In a series of measurements made on acoustic 
plaster, it was found that the absorption coefficient 
varies from point to point by considerable amounts, 
and that the difference in absorption coefficient is 
closely correlated with differences in the appearance of 
the material. On acoustic plaster, it is therefore im- 
portant to make measurements on as many different 
locations as practicable, and especially to sample as 
many points having different surface texture as 
practicable. Acoustic tile appears to be more uni- 
form from one sample to another, and it may not be 
necessary to sample as many points on a tile sample 
as on a plaster sample. 

If the tube is to be used for acceptance testing on 
a new installation, it is possible to obtain a better 
correction than the + 0.07 found from the average 
of many samples, as previously described. This 
can be done by making tube measurements on a 
sample of the material at the time the sample is 


Field measurement of sound-absor prion coe flicients 
being made by long-tube method. 


Ficure 3. 

















tested in the reverberation chamber. The coeffi- 
cient at 512 ¢’s is also found by the tube method, 
averaging over many points on the same sample. 
The difference between the coefficients found by the 
reverberation-chamber method and by the tube 
method is taken as the correction to be applied in 
the field. This method gives a coefficient that will 


usually agree with the reverberation-room coeffi- | 


cient to within +0.05. Tube measurements taken 
on the same sample at different times have a maxi- 
mum spread of only 0.03. 

The availability of a portable apparatus for non- 
destructive field testing makes it possible to correlate 
the appearance of a small area of acoustic plaster 
with its sound absorption. 


sorption of the finished plaster to a large extent, 
but it is not known just how plaster should be ap- 
plied for best results. The details of workmanship 
that are believed to affect sound absorption include 
the pressure of the trowel on the plaster, time of 
final troweling after application, consistency of mix, 
and the texture imparted to the surface. Different 
textures are obtained by the use of a stiff straw 
brush or a nail perforator, or both. Considerable 
difference of opinion exists concerning the impor- 
tance of each of these factors and their proper manip- 
ulation. The development of the fouediien method 
makes possible objective evaluation of the various 
factors that are involved in the application of acoustic 
plasters. 

The long-tube method makes it possible to meas- 
ure the sound absorption of small areas, so that 
samples of many different examples of workman- 
ship can be taken. 

Other applications of the long-tube method have 
included absorption measurements made on _ the 
acoustic plaster of the ceilings of several construction 
projects for the Federal Government. The results 
of some of the tests are given in table 2. 

A recent example of the usefulness of the long- 
tube method was the task of measuring the sound- 
absorption coefficients of two batches of commercial 
acoustic tiles. It was desired to select from each 
batch a smaller group of tiles nearly homogeneous in 
sound absorption for use in a series of round-robin 
laboratory tests. Figure 4 shows the percentage of 
the total number of 12- by 12-in. tiles in each batch 
that had the indicated sound-absorption coefficient. 


TaBLe 2. Absorption measurements 
Number * 
Location of points we + “+ Average at 
measured 
Building A at 512 ¢/s 
1 38 0. 24 to 0. 8 04 
2 Bal) -Bto .45 35 
3 7 -82to .41 . 
4 y to .7 uM 
Building B at 512 ¢/s 
Fifth-floor ceiling 42 0 Bto 0. #2 04% 


It is believed that the | 
workmanship of the plasterer affects the sound ab- | 






































—~ 
ACOUSTIC TILE ACOUSTIC Tic 
GROUP “A” GROUP “B* 
304 
‘an 
| 
it 
* | | 
a = 
> 
° 
« 
é 1] | 
204 
. ir 
no | 
w 
2 
. LJ 
104 
ie 
eae 
| | | 
rid 
ql ; | | | 
1  — | 4) 
3s 40 45 $0 35 Be) -80 rm 
SOUND ABSORPTION COEFFICIENT (a"¢) 
Ficure 4. Comparison of distribution of sound-absorption 


coefficients in two different acoustic materials as measured by 
the long-tube method. 


Sample A consisted of 108 12- by 12-in. tiles, and sample B consisted of 195 12 
by 12-in, tiles, 


6. Conclusions 


The necessity of avoiding transverse modes sets 
an upper limit to the frequency at which a tube can 
be used. The lower limit of usable frequency is set 
by the requirement that the tube must not be too 
long for easy handling. These limits restrict the 
use of the tube rime a here to frequencies near 
512 c/s. However, other experiments indicate thai 
if the formula used here to calculate the absorption 
coefficient is applied over the range of frequencies 
from 256 to 2,048 e/s, the deviations between the 
tube method (a*) and the reverberation-chamber 
method (a,,.) tend to be in opposite directions at 
different frequencies, so that the noise coefficient 
calculated from tube measurements agrees quite 
well with that calculated from reverberation-chamber 
measurements. 

It would be possible to cover the frequency range 
from 256 to 2,048 ¢/s by making use of a separate 
tube of suitable size for each frequency. This has 
not been done because of the awkwardness of such an 
arrangement. However, it appears possible to re- 
place the long tube with a short tube, which, in 
essence, is a tube so short as to constitute a cavity. 
Such a device would be readily portable, and should 
operate over an extended frequency range. A pre- 
liminary investigation of this method has been 
started. 


Wasuineoron, April 16, 1952. 





nu 
lor 
atl 
su 
to 
ful 
mi 
Is 
ble 
to 
su 
Ine 
an 


is 
ing 
lo 
pe 
are 
vle 
col 
of 
gls 
pr 
of 
ho 
co 
at 
ho 
Ing 
its 


bo 
th 
be 


Is 3 


eal 
‘ li 








pion 


ed by 


195 12 


sets 
cal 
| set 
too 
the 
lear 
hat 
Hion 
cles 
the 
ber 
} al 
ent 
lite 
ber 


nge 
‘ate 
has 
an 
re- 


in 
ty. 
ald 
re- 
en 















f Research of the National Bureau of Standards 








Vol. 49, No. 1, July 1952 Research Paper 2340 


Refractive Uniformity of a Borosilicate Glass After 
Different Annealing Treatments 


Leroy W. Tilton, Fred W. Rosberry, and Florence T. Badger 


In orde 


.o investigate claims that only low holding temperatures are adequate when 


annealing optical glass for a highly homogeneous product, interferometric tests were made 
on ten 2-inch cubes of borosilicate glass after an annealing at 515° C, and then the tests were 


repeated after the cubes were reannealed, five at 490° and five at 530° C. 


For each of three 


presentations of the cubes with respect to light paths, contours of differences in refractive 


index were drawn at intervals of 5 10-7. 


+1 10-* in this annealed glass. 


It was found that index variations seldom exceeded 
From analyses of the data, it was concluded that there 


need be little, if any, difference in degree of homogeneity, even if the holding temperature 
during annealing is 30 or 40° C above the lowest feasible value 


1. Introduction 


It is well known that the refractive index and den- | 


sity of glass are functions of annealing temperature 
|, p. 519] '. Within limits, the refractive index of a 
number of silica glasses at room temperature has been 
found to increase linearly as lower annealing temper- 
atures are selected, provided the holding time is 
sufficient in each instance to allow the glass to come 
to a state of approximate structural equilibrium and 
further that the pieces are sufficiently small to per- 
mit cooling to proceed so rapidly that the equilibrium 
is not appreciably changed thereby. It is also possi- 
ble to terminate and control these processes of coming 
to equilibrium by decreasing the holding times at 
suitable given temperatures and thus obtain lower 


indices at room temperature than for glass that is | 


annealed at the same temperatures for longer periods. 

Because of these possibilities, increased attention 
is being given to the necessary procedures for adjust- 
ing the indices of glass by reannealing at higher or 
lower temperatures and with shorter or longer holding 
periods. By such means a higher degree of stand- 
ardization can be reached in the making of optical 
glass than has formerly appeared feasible. In this 
connection, however, the question of relative degrees 
of homogeneity has properly been raised. If optical 
glass is arrested, or “‘fixed’’, by cooling while in the 
process of sluggish readjustment from one condition 
of structural equilibrium to another, is it then as 
homogeneous as it would be if cooled from almost 
complete equilibrium at some one annealing temper- 
ature? Or is a borosilicate glass, for example, as 
homogeneous in an equilibrium condition correspond- 
ing to an annealing at 530° or 515° C as it would be in 
its more dense equilibrium condition corresponding 
to an annealing at 490° C? 

According to some views [2] the answers to one or 
both of the above questions seem to be negative, and 
there is widespread opinion that optical glass cannot 
be satisfactorily annealed and homogeneous unless it 
is as dense and high in refractive index as it is practi- 
cally possible to achieve by a so-called “full” or 
“limit” annealing at a rather low holding temperature 

To the extent that these views are valid, it seems 





Figures in brackets indicate the literature references at the end of this paper. 





21 


to the writers that their practical application is con- 
fined to heat treatments where it may be customary 
to employ very much higher treating temperatures 
and much more rapid cooling than is usual in the 
making of good optical glass. Their full acceptance 
would lead to longer and unnecessarily expensive 
programs of fine annealing of optical glass and could 
preclude much of the contemplated freedom in ad- 
justing the index of refraction for special purposes or 
for economical standardization. 

Such views regarding the necessity of low-tempera- 
ture annealing may be briefly considered under two 
main headings. The first of these is degree of homo- 
geneity, and the second is stability. Some of the 
arguments regarding inhomogeneity seem based on 
the fact that, during cooling, the surface necessarily 
cools earlier than the interior. (Such arguments 
seem to neglect the factor of relative rates of cooling.) 
Inasmuch as the surface attains higher index while 
the center is still unchanged, speed in cooling is ree- 
ommended to insure that the inhomogeneity so pro- 
duced is kept within permissible limits. Obviously, 
then, it may be thought that only very low holding 
temperatures should be used in order that the read- 
justments of the outer portions shall be so sluggish 
that they are inconsequential for the purposes for 
which the glass is intended. This argument, as some- 
times presented, seems to overlook the important 
point that the contemplated inhomogeneity may be 
in large part only transitory. The interior portions 
in turn must follow through the same temperature 
regions, and the center may merely lag with respect 
to the surface in attaining a higher density. Only 
difference in cooling time, as between center and 
surface, during the very early stages of cooling can 
impress upon homogeneous glass a permanent differ- 
ence in properties. Since both the center and the 
surface cool at almost the same rate, after the “steady 
state’’ is reached, it follows that (unless the holding 
temperature is very high) the customary very slow 
(rather than fast) rates of cooling are initially desir- 
able, at least until the steady state is reached. Slow 
cooling can be used from any annealing temperature 
(holding temperature) with the attainment of final 
homogeneity to almost any desired degree. 

On the other hand, in the cooling of glass from a 
preheating temperature to an adequately high hold- 








ing temperature, it is obvious that time can be saved 
by initial rapid cooling. Conceivably, also, if the 
selected holding temperature is unnecessarily high in 
the annealing range, some time can be saved there- 
after by again using a rapid rate of cooling, provided 
one can know just when to decrease the rate before 
reaching temperatures at which the lag in attainment 
of equilibrium demands minimum temperature dif- 
ferences between center and edges until a steady 
state is reached. 

The second point for consideration is the matter 
of stability at ordinary temperatures. There seems 
general agreement that the sluggish readjustments 
that oceur in the annealing range proceed more 
slowly, and more or less exponentially, as the tem- 
perature is decreased. Also it is known that many 
many months of heat treatment are required for the 
production of very small changes in refractive index 
at the lower temperatures of the annealing ranges, 
temperatures that are nevertheless very high indeed 
compared to room temperature. In the course of 25 
vears during which experiments of this nature have 
been in progress at the National Bureau of Stand- 
ards [1, p. 519] no evidence of instability at room 
temperature has been found in index of refraction of 
annealed optical glass of good or even fair quality. 
Many glass prisms used as standards of refractive 
index have been measured and remeasured to six 
decimal places over this 25-vear period, and no def- 
inite changes have been detected. Any systematic 
changes as large as +5X10~° should have been 
noticed if they occurred. Although most, or all, of 
these glasses were annealed, some were included that 
had merely been “pot-cooled.”’ 


Certainly, none of 
these prisms were annealed at temperatures so low 
that maximum attainable indices were even approxi- 


mated. Therefore, the writers suggest that such 
changes in refractivity as may occur and be detected 
in the course of months after extremely severe chill- 
ing of glass are, perhaps, not of exactly the same 
nature as those readjustments that occur in the an- 
nealing range and certainly do not need consideration 
in the fine annealing of optical glass. 

Winter’s subsequent discussion of annealing prob- 
lems [3] serves to correct some impressions that 
were obtainable from her earlier papers. For 
example, in presenting the “freezing process” it is 
“emphasized that the cooling rate does not need 
to undergo any sudden change at 7;,; the variation 
of temperature can be continuous from 7, to room 
temperature if the cooling rate at each temperature 
is rapid enough to avoid any further transformation 
of glass”, and the process is called a “method of 
annealing resulting in glass that is structurally 
homogeneous although not entirely stabilized and 
it shows that a considerable gain of time can be 
realized with respect to the time of limit annealing.”’ 
Obviously, the lower the holding temperature, the 
more successfully can this freezing process be applied. 
But lower holding temperatures are by no means 
necessary or exclusively desirable. The form of 
cooling curve from a preheating temperature to a 
relatively high holding temperature may be selected 





entirely for economy of time. For this, an initi; ||, 
rapid rate with progressive retardations is reason» le 
provided the subsequent holding time shall be « \¢- 
quate for structural homogeneity at that partic) \ay 
holding temperature. However, in the practical! 
annealing of optical glass, economy in time of hold ing 
will seldom permit selection of a holding temperat re 
so low that sudden cooling can follow withow 
seriously lowering the surface refractivity below that 
of the central portions. From reasonably high 
holding temperatures we continue to suggest the usual 
procedures of cooling slowly at first, so that the 
rate for surfaces and center may be more nearly 
equal in the effective annealing region, and then 
following by very gradual increases in the cooling 
rate. 


2. Description of Glass Samples and 
Annealing Procedures 


In order to test the practical importance of dif- 
ferences in annealing, ten 2-in. cubes of borosilicate 
glass were prepared from pot glass and polished for 
inspection as to striae and initial strain, and also 
for refractive-index measurements with a precision 
refractometer. They were then all annealed at one 
time in one furnace by the Glass Section of the 
National Bureau of Standards, holding at 515°C 
for 12 hr (after a preheating at 555°C for 2 hr) 
and cooling about as rapidly as advisable after the 
initial cooling at about 1° an hour. The cubes 
(except No. 5) were enclosed in individual boxes 
of 3/64-in. aluminum and placed in circular array 
on perforated trays in air only. 

After repolishings, tests, and measurements to be 
described later, cubes 2, 4, 6, 8, and 10 were rean- 
nealed by holding at 530°C for 17 hr (after pre- 
heating at 555°C for 12 hr to insure complete cancel- 
lation of all effects of the initial annealing) and 
cooling from 530° to 500°C at 3/4 deg an hour, 
from 500° to 475°C at 1 deg an hour, ete. In this 
process these cubes were again enclosed in their 
individual boxes, which were circularly arranged 
on small inslulating bricks inside a large aluminum 
box of 16-in. diameter and 8-in. depth, with walls 
5/8 in. thick. The large box was in turn separated 
from the iron box of the furnace by air and by 
insulating bricks on which it rested. 

Cubes 1, 3, 5, 7, and 9 were similarly arranged 
in the same furnace, and reannealed by holding 
at 490° C for 15 days (after preheating at 545° 
for 10 hr) and cooling slowly, from 490° to 425° 
at 4° an hour, etc. 

The first reannealing, at 530° C, and its relatively 
slow cooling should produce glass that is fixed during 
its incompleted progress from equilibrium condi- 
tions at 530° C toward conditions at some lower 
temperatures. This glass is, therefore, about as far 
below so-called ‘maximum density” as will ordin- 
arily occur in annealing as practiced for optical 
glass. The reannealing at 490° can be expected 
to produce glass that approximates the room-tem- 
perature condition corresponding to an equilibrium 





dif- 
‘ate 
for 
ilso 
ion 
one 
the 
°C 
hr) 
the 
bes 
xes 
ra\ 


be 
an- 


TABLe 1. 


Initial conditions (‘‘pot-cooled” 


Strain 
birefrin- 
gence 


Refractive 
index, np 


Cube 


number Striae content 


my/cm 
51656 None 
S655 5 do 
51657 do 
51655 More than cube 6 
51657 : More than cube 4 


5165S 
51657 
51655 
51627 
51608 


Very little 
Ytol4 


s 
Stol4 Very little 


After annealing at 
515° C 


Ap 
(515 
minus 


Data on 2-in. cubes of borosilicate glass ' 


After subsequent annealings 


Strain bire- 


re »2 
Strain frigence 
birefrin- 
rence 3 z 
— 490° minus 530° minus 


515 515 


myp/cm my/tm Mp/icm 
2 
—16x10~5 


4 
3 
4 
4 


computed composition of this glass, in percentages by weight, was SiO», 66.6; ByOs, 12.2; Na»O, 8.2; K2O, 12.0; ZnO, 0.5; AseOs, 0.5 
fhese measurements were made by O. H. Grauer in the Bureau's Glass Section by means of a graduated quartz wedge and a set of calibrated strain disks. 


‘ 


at 490° C. This produces a denser glass with ¢ 
higher refractivity than is ordinarily obtained in 
practice. The intital annealing at 515° C is of 
intermediate character, and corresponds more or 
less to good annealing practice. An important 
difference between the annealing at 515° C and 
the subsequent annealings was the use of the large 
aluminum box in each of the reannealings. 

The results of preliminary and subsequent exam- 
inations of the cubes are given in table 1. Details 
of the annealing schedules are given in figure 1. 
The average indices of these cubes after the pot 
cooling and after the annealings at 530°, 515°, and 
490° C were 1.51649, 1.51708, 1.51726, and 1.51803, 
respect ively ° 


3. Interferometric tests of homogeneity 


In the past it was considered that optical glass was 
annealed primarily to reduce internal strains and 
thus prevent birefringence or reduce it within the 
tolerance limit, say 5or10my/em. Now it is realized 
that it is necessary to anneal primarily to get re- 
fractive uniformity throughout the volume of the 
glass. This can be accomplished by the use of 
annealing equipment and schedules so designed that 
the temperature differences within the glass are very 
small, say small fractions of 1° C, during the holding 
period of annealing and the early stages of cooling. 
Necessarily, then, the glass will be in an unstressed 
condition as well as homogeneous in refractivity. 


3.1. Qualitative Examinations 


_After fine annealings, it is found that glass is 
uimost invariably within the specified birefringence 
tolerance and the crucial testing, if any, for optical 
uniformity is done on an interferometer. The 
‘implest and most convenient instrument for such 
esting of optical glass is the Hilger prism interfer- 
meter [4, p. 120] made by application of the Twyman 
nd Green principle to the Michelson int»rferometer 


| 
| 











600 


on 
°o 
°o 


L 
Oo 
°o 


TEMPERATURE ,°C 
La) 
o 6° 
o Oo 


re) 
° 











150 400 450 500 550 600 


TIME , HOURS 


50 100 


Ficure 1. Temperature-time schedules for (1) annealing 
ten 2-inch cubes at 515° C; (2) reannealing five cubes at 
530° C; and (3) reannealing five cubes at 490° C. 


as arranged for collimated beams. The essential 
difference from the Michelson is that after reunion 
at the diagonal, the interfering beams are brought 
by means of a lens, to a focus for observation or 
photographic recording. 

Unfortunately for the rapid extension of inter- 
ferometric testing of optical glass, it is necessary to 
polish two opposite surfaces. If the surfaces are 
optically plane and the fringes seen through the 
sample are straight and equispaced, then there is 
said to be no “error’’, and it may be concluded that 
the glass is either (1) homogeneous, (2) its gradients 
in refractivity are linear, or (3) any nonlinear gra- 
dient is parallel to the light beam. If the surfaces 
are accurately parallel as well as plane, then the 
component of the optical gradient transverse to the 
light beams is at once evidenced by the contour of 
the fringes. The essential planeness of poor surfaces 
can be effectively achieved by adding a suitable 
contact liquid and plane parallel plates of almost 
identical ems, Some samples of borosilicate glass 
with ground surfaces have been examined interfero- 
metrically when combined with methylphthalate and 
plane parallel (polished) plates as the windows. 





In practice it is assumed that glass showing no 
error will be satisfactory, even for work of the highest 
quality. The success of such indefinite and qualita- 
tive procedures in the examination of glass means 
that gradual index gradients in optical glass are 
usually harmless and approximate the effect of a 
very weak prism superimposed on the whole optical 
system as designed and constructed. An important 
exception is found whenever large prisms are made 
with internally reflecting surfaces. The Dove erect- 
ing prism is a good example. If a linear index 
gradient exists in a direction normal to the reflecting 
surface, then the beam transmitted by the prism 
will be astigmatic, even if the surfaces are perfect. 


3.2. Precise Quantitative Testing 


between any two points where the two systen.s 0 
fringes intersect. It is interesting to note that pre. 
cisely flat surfaces are not required. The wr;(ers 
in applying this method for accurately investigs ting 
the degree of homogeneity in good optical glass, jay, 
found it advantageous to photograph both refle: tio, 


/ and transmission fringes with a superposed grid o/ 


Occasionally, therefore, when selecting glass for | 
| data were needed in order to distinguish between, « 


special purposes, such as penta and other large 
reflecting prisms and the beam splitters and compen- 
sators of interferometers, it is desirable to know 


quantitatively that the existing degree of inhomo- | 


geneity is confined within suitable limits. Twyman 
and Perry [5], in 1922, outlined a method by which 


this could be on aang in the case of nearly plane | 


parallel plates of glass whose thickness would permit 
interference fringes after reflection at the two 
polished surfaces. Recent unprecedented demands 
for very homogeneous glass for large wind-tunnel 


fine wires in order to define numerous points bet weep 
which comparisons were to be made. The projer- 
tions of the negatives were measured by means of , 
comparator and all wire intersections were precise}, 
located with respect to the fringe systems. 

Such observations and the requisite computations 
were carried out with great accuracy for each of thy 
three possible presentations of the cubes on which 
this paper is written. It was found possible to deter. 
mine differences in the seventh decimal place oj 
refractive index. In this instance, highly preeis 


assess, if possible, the relative merits of differen; 
annealing procedures. However, this process is |p. 
borious and too slow for acceptance or rejection tests 
on glass, where in most cases only the fifth decima! 
in index needs consideration. 


3.3 Rapid Quantitative Testing 


The following procedure, based on observations 


_ of the ratio of m, to m, is suggested as feasible, quick 


interferomete:s and schlieren benches have stimu- | 


lated interest in tests of homogeneity, and the new 
ares with mercury 198 have made it possible to 
extend materially the thickness of glass that can be 
used in interferometry. 


If a plate of glass of thickness, f), and index of | 


refraction, No, is placed normally in one arm of an 
adjusted prism interferometer, the number of fringes, 
m,, seen by transmitted light between points 1 and 2 
on the surface of the plate is 


9 
m, “7 [to(An; —Any) + (nm o— 1)(At, —At,)], (1) 


and if the end reflectors of the interferometer are 
covered, one sees fringes by reflection to the number 


of 


9 
m= > [to(An, —Any) + no(At, —At,)], (2) 


where At and An are local variations in total thick- 
ness and in average refractive index through the plate 
on lines through the points specified by subscripts. 

Twyman [4, p. 136] has indicated a method of 
plotting contours for the transmission and the reflec- 
tion fringes and for computing both the inhomoge- 
neity, An, and the difference in glass path, At, accord- 
ing to the equations 


r 
An ry, [nm,—(n—1)m,] 


and 


At _> (m,—m,), 





and sufficiently accurate for many cases. 

For high-quality optical glass (An near zero), and 
for appreciable values of Af, it is evident from eq | 
and 2 that there are more fringes seen by reflection 
than by transmission, the ratio being n/n—1. On 
the other hand, for poor optical glass with appreciab) 
values of An, and a plate of nearly uniform thicknes 
(A, being small), the ratio is but little greater thay 
one. For borosilicate glass of near optical quality 
the ratio of m,/m, varies from 3.0 to 1.0, and the mer 
counting of the number of fringes between two point: 
neglecting or approximating fractions, is usual) 
sufficient to determine this ratio with adequate pre- 
cision for useful estimates of An by an inspection of 
values of m,/m,, such as listed in table 2. 


Taste 2. Ratios of m,/m, as functions of te and An for MI 
1 \ and At=10 A; My=1.517 


Aan=1x<l0" An=1x<l0-* An=1xl0O" An=1xl0" 


St=10 Af=—A Af=10A Af=A AM=-1IOA Af=—A’A | Af=WA Af=) 


248 Ls 
70 1.W 
43 1.06 


From eq 1 and 2, it is evident that (m,—m,) 
2At/d and that m,, as observed, consists of two parts 
m,, owing to At only, and m,, owing to An only 
As my=(no—1)(m,—m, ), it is possible to deduce 
directly from the observations m,,—=m,—(No—! 
(m,—m,), or, in other works, to obtain the number 





pre. 
ers 
ting 
, DAV: 
tor 
I d of 
tween 
rojec- 
S of 4 


cisely 


ations 
of thy 
Which 
leter- 
ce of 
recs 
Cn, or 
ferent 
is la- 
tests 
cima 


itions 
puick 


, and 
eq | 
“Cc LION 
. On 
lable 
kness 
tha 
ality 
mer 
pints 
ually 
, pre- 
on ol 


smission fringes ascribable solely to inhomo- 

An (averaged through a total thickness, f)), 
| paths corresponding to points 1 and 2 on 
ss surface. For each such fringe solely due 
omogeneity (that is, under the conditions 
and At=0 in eq 1), An=)/2fo, and table 3 is 
ted accordingly. 


3. Refractive index inhomogeneity, averaged for thickness 
lo, cor responding loa diffe rence of one fringe 


An for various wavelengths 


oon 


similarly, the number of fringes ascribable solely 
to difference in thickness, Aft, is (m,—m,), and for 
each such fringe difference, At=}/2. 

The glass must be in temperature equilibrium. 
The surfaces must approximate parallelism so that 
ihe fringes are countable, but the degree of planeness, 
as such, is unimportant. With an isotope mercury 
source, thicknesses greater than 6 em are usable. 

Stated as a rule: To determine An between paths 
2 and 1, count interference fringes, m,, formed by 
reflection and also the transmission fringes, m,. 
Multiply the difference, (m,—m,) by (n—1), and 
subtract the product from the observed transmission 
fringes, m, Multiply the remainder by the appro- 
priate number taken from table 3. 


4. Details Concerning Precision 
Measurements 


The source used in examinations of the 2-in. cubes 
was a krypton tube with a filter suited for trans- 
mission of the vellow line of wavelength 5871 A. 
The cubes were placed on a metal base provided with 


Ficure 3. 


Left to right 


leveling screws and a vertical frame in which threads 
were mounted to form a rectangular reference grid of 
horizontal and vertical lines at intervals of 1 em. 
When a nearly perfect cube thus mounted is inserted 
in one arm of a Hilger interferometer, and adjusted 
with one of its faces normal to the parallel beam of 
incident light, one can see by transmission how the 
glass affects the fringes that would otherwise be 
present in the interferometer path. “One can then 
compare the transmitted air-plus-glass fringes with 
the air-only fringes that can at the same time be seen 
above and at sides of the cube. After shielding the 
interferometer mirrors, one can view the fringe 
system that is formed by interference of light that is 
reflected at the front and rear surfaces of the cube. 
Figure 2 is a view of the interferometer with a cube 
in position for photographing the transmission 
fringes. 

For precisely determining the difference in order of 
interference between the central path and any of the 
24 other paths whose points of entrance and emer- 
gence are defined by the grid, it was found convenient 
to make three negatives, which later were projected 
for readings on a comparator at each of the grid 
intersections. All exposures were recorded on glass 
plates with a Panatomic-X emulsion. The plates 
were developed by the manufacturer’s recommended 

| procedure with D-19 developer. The reflection, or 
| glass-only, fringes were first exposed for a duration 
| of 15 minutes. After uncovering the end mirrors, 


| the second plate was exposed for 40 seconds to show 


Figure 2. Interferometer with glass cube in position for 


photographing the transmission fringes. 

















Fringes as photographed for the B presentation of cube 6 after annealing at 515° C. 


reflection fringes, transmission or air-plus-glass fringes, air-only fringes. 


25 
















































































































































































the air-plus-glass fringes surrounded by the air-only 
fringes. If the latter could be precisely set and 
maintained at one color, it might be possible to 
operate with only these two exposures, but it is 
found better to carefully remove the cube after the 
second exposure and immediately thereafter record 
the air-only fringes with a 40-second exposure. 
After reading all three plates at the grid intersections, 
one compares the air fringes in the second and third 
exposures for possible shifting of air fringes and 
makes any necessary corrections. Then one obtains 
the true transmission fringe readings by subtracting 
air-oaly from air-plus-glass readings. 

Of course it is necessary to ascertain on each plate 
the direction in which the whole order of interference 
increases. Also, the departure from parallelism of 
opposite faces of the cube must be so adjusted during 
the polishing that a conveniently measurable number 
of fringes is seen by reflection. For good glass the 
geometrical wedge, which can be determined by 
means of a precision optical gage, determines the 
direction of increase of the order of interference. In 
doubtful cases, and for confirmations, one can use 
local heating at one edge of the cube while the fringe 
system is being observed. 

With a knowledge of the index of refraction and 
the thickness of the cube, one can use the observed 
data on order of interference to compute An by 
means of eq (3); also, if desired, one computes Af 
according to eq (4). 


5. Contours of Refractive Inhomogeneity 


For completeness, the interference fringes obtain- 
able in a cube were photographed in each of the 
three mutually perpendicular preseniations. Expo- 
sures were made only after the cubes had been in 
position for at least an hour after handling in a room 
where the temperature was varying but slowly. The 
three exposures for a given presentation were made 
in rapid succession to minimize errors causable by 
changes in temperature. The fringes obtained for 
the B presentation of cube 6 after the 515° C anneal- 
ing are shown in figure 3. 

Resulting values ofAn with respect to the central 
path were used in plotting contours at intervals of 
51077 in refractive index. These maps are illus- 
trated in figure 4. 

Considering the averages for all three presenta- 
tions, cubes 1, 2, and 3 were among those that 
appeared particularly homogeneous after the first 
annealing at 515° C, and cubes 6, 7, and 10 were 
among those least homogeneous. After the re- 
annealing at 490° C, cube 1 was again very homo- 
geneous, but cube 3 was the least homogeneous of 
the five in that group. After the reannealing at 
530° C, cube 10 had changed its relative rating and 
then appeared to be the most homogeneous; cube 2 
had also changed, and seemed to be the least homo- 
geneous of that group. Cube 5 was ignored in this 
connection because, as mentioned in section 2, its 
environment, during the first annealing only, was 
uniquely unfavorable in that no thin aluminum box 


| 





was provided. Although it had about twic. 4. 
many contours as the other cubes after the 5) 5° ( 
treatment, it became of average condition afte. {), 
490° C annealing. 

The facts given show that we are not dealing 
primarily with fixed chemical inhomogeneity in ‘hes, 
cubes but with temperature effects impressib! > 
the medium in a varying manner. As this bovosil). 
cate glass has a refractive sensitivity of aboy 
5107/1 deg C in annealing temperature, jt 
indicated that the gradients in the annealing furnac 
were about 0.04 deg C/in. lower inside the thi 
aluminum boxes than outside. Also, from the moye- 
or-less-pronounced changes in comparative ratings 
of the cubes, it may be concluded that the gradien). 
inside the boxes may have varied from box to box 
during a given annealing. 

In most of the contour maps the evidenced jp. 
homogeneity is so small and evenly distributed tha: 
there is little readily detectible systematic arrang 
ment. In cube 5 after the 515° treatment, howeve 
all three component maps agree in indicating 4 
higher effective annealing temperature near thy 
edge shown in the foreground. Similarly, cubes 4 
and 8 after the 530° C treatment evidence highe 
effective annealing temperatures at their rear edges 
But to a considerable extent these contours lack 
marked systematic arrangement, and this suggest: 
that any inhomogeneity caused by furnace gradients 
during holding times or by differences in time of 
cooling of surface and interior are probably masked 
Unquestionably, there may exist in these contow 
maps some masking effects of room-temperatur 
gradients at the time the interference fringes wer 
photographed. It was on this account that afte 
initial equilibrium, all three exposures were taken i 
close sequence for each presentation and all thre: 
presentations for each cube were completed in as 
short a time as possible. However, since the chang: 
in refractive index of this borosilicate glass is onl) 
1710°7/1 deg C change in room temperature, 1 
is evident that room-temperature variations woul 
have to cause gradients of from 0.5 deg to 1.5 deg 
C/in. within the glass cubes in order wholly to 
account for the apparent inhomogeneities. 

There are other reasons why it cannot be assumed 
that these contours are largely accidental. In all 
cases the photographic negatives of interference 
fringes were measured by two observers and _ the 
requisite computations made independently. lo 
general, the disagreements are small in the seventh 
decimal place of An, and only averaged results were 
used for plotting the contours. In many of these 
maps the run of adjacent contours gives interna! 
evidence of precision well within the limits of one 
contour interval. Considering all data and the facts 
mentioned in this analysis, and although room 
temperature as well as furnace gradients may have 
somewhat influenced the final contours, it is certain 
that all of these cubes are very homogeneous. As 
will be seen in the following section, these data can 
be averaged to minimize the aspects of accidental 
character, and then analyzed to show clearly some 


or 


Is 





( ing 


hes, 


1 OSil}- 


abo 

it is 
racy 

thir 
miovre- 
i Lings 
lients 
) box 


“dl in- 
| tha 
anger 
reve, 
hg af 

thi 
e8 4 
ighe: 
ges 
lack 
ESTs 
jients 
1e of 
sked 
tou: 
Ature 
wer 
afte: 
en in 
thre: 
nN as 
ang 
onl 
e, It 
ould 
» deg 
y to 


med 
n all 
‘ence 
the 
ln 
enth 
were 
hese 
na! 
one 
facts 
oom 
have 
tain 
As 
can 
ntal 
«ome 











G No! 


Fioure 5. 














REJECT 


Inhomogeneity of two cubes from cargo of German submarine and one sample of domestic optical glass rejected 


as unsatisfactory. 


( tour interval 40 10~* in refractive index 


systematic effects that are reasonably ascribable to 
iemperature conditions during the annealings. 
In contrast with the homogeneity represented in 


figure 4, there are presented in figure 5 some similar | 
bn] 


results on cubes fashioned from German optical 
glass (annealing grade unknown) taken from the 
cargo of a submarine that was intercepted during 
World War II on its way to Japan, and also the 
results on a sample of domestic optical glass that was 
rejected because it had been unsatisfactorily annealed. 
Note that in the figure 5 diagrams the coatour 
interval is 401077, or eight times as large as in the 
cubes of figure 4. 


6. Comparative Results for Different 
Annealings 


For comparative purposes, a method of obtaining 
an average estimate of homogeneity for all cubes used 
ineach annealiag is desirable, and it is important and 
convenient to consider asymmetry as well as the 
radially distributed symmetrical changes in refrac- 
tivity. Asa preliminary for both considerations, the 
24 observed values of n were considered, for each of 
the three presentations of each cube, according to 
their sign and their distance from the central path. 
As will be evident from figure 6, the paths are 
elements of the surfaces of five cylinders whose 
projections are shown as circles on a cube face. The 
radii are 1,42, 2,45, and 2,2 em, with four values 
of An corresponding to each cylinder except the next 
largest, which has eight values. In each circle the 
maximum difference in values of An between diamet- 
rically opposite elements was taken as an arbitrary 
measure of the asymmetrical inhomogeneity for the 
corresponding cylindrical zone of that particular 
presentation of the cube. Such values averaged for 

















Schematic cylindrical shells for use in analysis 


data on degree of homogeneity. 


Figure 6. of 


The refractivities along 24 elements of five concentric cylinders were determined 
with respect to the axial path at center. In each shell the maximum difference 
in index between diametrically opposite elements is an arbitrary measure of the 
asymmettical inhomogeneity at five distances from the center. 


three presentations can be plotted against radii for a 
measure of the maximum asymmetrical inhomogene- 
itv of each cube after each annealing. Such a curve, 
averaged for the initial data on all numbered cubes 
except 5, is characteristic of the first annealing. 
Similarly, later data on cubes 1, 3, 7, and 9 vield a 
curve for the reannealing at 490°, and cubes 2, 4, 6, 
8, and 10 provide for the reannealing at 530° C. 
These curves of zonal variation in refractivity, 
figure 7, show the asymmetrical distribution of the 
existing inhomogeneities. 














yn 








10" X MAXIMUM ASYMMETRICAL ZONAL An 


b 




















2 3 
RADIUS OF CYLINDRICAL SHELL CM 


Effect of temperature gradients during holding 
periods of annealing. 

Since the orientation of asymmetry does not persist in given cubes from one 
annealing to another (fig. 4), the cause is not chemical in nature but merely 
residual thermal gradients of the order of 0.01 deg C per inch during annealings 
Concerning cube 5, see p. 26. 


Fieure 7. 


If for two different annealings the asymmetry does 
not persist in almost the same orientation within a 
given cube, then it may be concluded that the effec- 
tive causes are not chemical in origin but principally 
thermal in nature. This is certainly the case for 
cubes 1 to 10. Linear gradients of refractive index 
across the cubes cause straightline curves of zonal 
heterogeneity, and larger index gradients are indi- 
cated by larger angles with the X-axis. An out- 
standing example of approximately linear gradients 
through a considerable volume of glass is given by 
cube 5 after the annealing at 515° C without its 
aluminum cover box, the values being fully five times 
the average for other cubes. From figure 7, then, 
it may be concluded that furnace temperature gra- 
dients, inside the aluminum covers, were reduced to 
about 0.01 deg C/in. during this annealing, and that 
without the thin aluminum covers the gradients 
might have been 0.05 deg C/in. 

nsofar as asymmetrical inhomogeneity is con- 
cerned, a valid comparison of the different annealings 
could be made by measuring the areas under the 
curves of figure 7. A satisfactory approximation is 
given by comparing averages of the ordinates for the 
observed radii. Excluding cube 5 for the initial 
annealing, the averaged ordinates are 6, 8, and 
8 10-? for the annealings at 490°, 515°, and 530° C 
respectively. These differences in homogeneity be- 
tween annealings are, therefore, so small that their 
significance is questionable. It would be necessary 
to conduct further experiments if the validity of 
differences of +1 or 2107’ in index over 2 in. of 
glass patch is to be established. Thus, insofar as 




















10’ SYMMETRICAL RADIAL An 
































! 2 3 
RADIUS OF CYLINDRICAL SHELL,CM 


Ficure. 8. Effects attributable to duration of holding time 
and to rate of cooling. 


Inadequate holding at medium and low annealing temperatures can (if preceded 
by preheating) cause relatively lower index (less increase) at centers. The arrows 
indicate estimates of the relatively greater reduction of index (less increase) at 
edges during cooling. 


asymmetry of inhomogeneity is concerned, the data 
of figure 7 show that the reannealing at the compar- 
atively high temperature of 530° is essentially the 
equivalent of the one at the comparatively low 
temperature of 490° C. This may mean, chiefly, 
that the furnace-temperature gradients can be, and 
were, essentially the same for each of these anneal- 


7 give no indication of the 


“7. 
he curves of figure 
radial gradients in index that may exist, symmetri- 


cally, from center to faces of the cubes. In order to 
compare the annealings in this respect, figure 8 was 
prepared. Here, as in figure 7, the ordinates are 
values of An averaged for the same cylindrical shells 
but for this symmetrical result, a simple algebraic 
average of An is used to represent the refractivity of 
each shell as compared with its axis. For the anneal- 
ing at. 530° C, it is evident that the outer portions of 
the glass cooled faster than the center with conse- 
quent lower index corresponding to a slightly higher 
(0.02° C) equilibrium temperature condition. For 
the annealing at 515° C the 12-hr holding period 
seems to have been inadequate for raising the index 
in the center as high as at the edge. A faster cooling 


























rate would have, to some extent, offset this index 
difference. At 490° C the holding period of 15 days 
was probably inadequate, but the cooling rate was 
almost right in order to compensate therefor. 

The actual average distributions of the inhomo- 
geneities after the three annealings are shown in the 
composite cubes of figure 9. The important fact is 
that all three of these annealings produce glass uni- 
form in index within approximately +1010~’. 
Such glass can be considered practically perfect, in- 
sofar as the users of optical glass are concerned, for 
any elements that can be manufactured from 2-in. 














Ficure 9. 


(a) Homogeneity expressed as AnX10-" for composite of five cubes 
(2, 4, 6, 8, and 10) annealed at 530°C. The initial cooling rate of 44 deg 
per hour was slightly too rapid so that the edges cooled faster than the 
center, with consequent higher effective annealing temperature and 
lower refractive index. 

(b) Homogeneity expressed as AnX10~ for composite of nine cubes 
(1 to 10, oa 5) annealed at 515°C. Following preheating, the holding 
period of 12 hours was probably not entirely adequate to raise the index 
at center as high as the edges. The cooling rate of 1° C per hour was not 
sufficiently rapid to entirely compensate by lowering the index at the 
, . 


ges. 
(ce) Homogeneity expressed as An X10? for composite of four cubes 


(1, 3, 7, 9) annealed at 490° C. The holding time of 15 days was almost 
adequate for raising the central index as high as that at the edge, and the 
initial cooling rate of 4¢ deg C per hour was so slow that the index at the 
edges was not materially lowered 


cubes. Even for wavelengths as short as 0.4 uw the 
distortions that could be imposed on wave fronts 
cannot exceed Rayleigh’s limit of \/4, unless the 
paths in glass of this quality are longer than 5 em. 

The conclusion that borosilicate glass homogeneous 
within +1 10~° in refractive index can be obtained 
by annealings in which the holding temperature is 30 
or 40 deg C above the lowest feasible annealing 
temperature is contrary to certain ideas that have 
been widely presented and have obtained some cre- 
dence in recent years regarding an alleged practical 
superiority resulting from annealing at very low 





temperatures, and the alleged necessity of obtaining 
maximum ‘‘compaction” in order to obtain desirable 
homogeneity. On the other hand, the results ob- 
tained in this investigation are in full accord with 
ideas expressed by Tool [6] and associates concernihg 
the possibilities of making useful adjustments in the 
refractive indices of very homogeneous optical glass 
by the choice, within limits, of suitable annealing 
temperatures. 
7. References 


{1} Arthur Q Tool, Leroy W. Tilton, and James B. Saunders, 
J. Research NBS 38, 519 (1947) RP1793. 

Winter, J. Am. Ceram. Soc. 26, 189-200, 277 
(1943) 


[2] A. 284 


[3] 
[4] 


A. Winter, J. Am. Ceram. Soc. 27, 266-274 (1944 


*. Twyman, Prism and lens making (Adam Hilg: 
London). 


[5] F. Twyman and J. W. Perry, Proc. Phys. Soe. | 


(6) A. Q. 


34, 151 (1922), 

Tool and L. W. Tilton, Annealing as a n 
producing desirec: refractivity changes in glass, »egq 

meeting of Am. Ceram, Soc., Washington D. C. ; {93 . 
A. Q. Tool, L. W. Tilton, and J. B. Saunders, F ifec; , 
heat treatment on the refractive index of glass, read , 
meeting of Opt. Am., Corning, N. Y. (1937, 
Changes caused in the Refractivitvy and 


density 
glass by annealing, J. Research NBS 38, 519 ( {947 
R P1793. 


Soc. 


WasHINGTON, January 31, 1952. 





Research of the National Bureau of Standards 


Vol. 49, No. 1, July 1952 Research Paper 234] 


Solution of Systems of Linear Equations by 
Minimized Iterations’ 


Cornelius Lanczos 


A simple algorithm is described which is well adapted to the effective solution of large systems 
of linear algebraic equations by a succession of well-convergent approximations. 


1. Introduction 


In an earlier publication [14 a method was 
described which generated the eigenvalues and eigen- 
vectors of a matrix by a successive algorithm based 
on minimizations by least squares.* The advantage 
of this method consists in the fact that the successive 
iterations are constantly employed with maximum 
eficieney which guarantees fastest convergence for 
a given number of iterations. Moreover, with the 
proper care the accumulation of rounding errors 
can be avoided. The resulting high precision is of 
creat advantage if the separation of closely bunched 
evenvalues and eigenvectors is demanded [16]. 

It was pointed out in [14, p. 256] that the inversion 
of a matrix, and thus the solution of simultaneous 
systems of linear equations, is contained in the 
general procedure as a special case. However, in 


view of the great importance associated with the 
«lution of large systems of linear equations, this 
problem deserved more than passing attention. 


lt is the purpose of the present discussion to adopt 
the general principles of the previous investigation 
to the specific demands that arise if we are not inter- 
ested in the complete analysis of a matrix but only 
in the more special problem of obtaining the solution 
of a given set of linear equations 

Ay=b, (1) 
with a given matrix A and a given right side do. 
This is actually equivalent to the evaluation of one 
eigenvector only, of a symmetric, positive definite 
matrix. It is clear that this will require considerably 
less detailed analysis than the problem of construct- 
ing the entire set of eigenvectors and eigenvalues 
associated with an arbitrary matrix. 


2. The Double Set of Vectors Associated 
With the Method of Minimized Iterations 


The previous investigation [14] started out with 
an algorithm (see p. 261) which generated a double 
set of polynomials, later on denoted by p,(z) and 
q(t) (see p. 274). Then a second algorithm was 


. The prepasation of this paper was sponsored (in part) by the Office of Naval 
esearcn. 

Figures in brackets indicate the literature references at the end of this paper. 

The present paper is a natural sequel to the previous publication and depends 
on the previous findings. The reader's familiarity with the earlier development 
is assumed throughout this paper; the symbolism of the present paper is in har 
mony with that used before, in particular the notation pq, if applied to vectors, 
shall mean the sealar product of these two vectors. 


207064—52——3 





introduced, called ‘minimized iterations’, which 
avoided the numerical difficulties of the first algor- 
ithm (see p. 287) and had, in addition, theoretically 
valuable properties for the solution of differential 
and integral equations (p. 272). 

In this second algorithm, however, only one-half 
of the previous polynomials were represented, cor- 
responding to the p,(z) polynomials whose coeffi- 
cients appeared in the fu/l columns of the original 
algorithm [14, (60)]. The polynomials q(x), asso- 
ciated with the half columns of [14, (60)] did not 
come into evidence in the later procedure. 

The vectors 6,, generated by minimized itera- 
tions, correspond to the polynomials p,(z) in the 


sense 


b, Pr( A) bo. (2) 


We should expect that the vectors generated by 
q.(A)by might also have some significance. We will 
see that this is actually the case. It is of consid- 
erable advantage to translate the entire scheme 
14, (60)] into the language of minimized iterations, 
without omitting the half columns. We thus get 
a double set of vectors, instead of the single set 
considered before. 

The additional work thus involved is not super- 
fluous because the second set of polynomials can be 
put to good use. Moreover, the two sets of poly- 
nomials belong logically together and complement 
each other in a natural fashion. From the practical 
standpoint of adapting the resultant algorithm to 
the demands of large scale electronic computers, we 
gain in the simplicity of coding. The recurrence 
relations which exist between the polynomials 
pd£), q(x) are simpler in structure than the recur- 
rence relation obtained by eliminating the second 
set of polynomials. 

We want to simplify and svstematize our notations. 
The vector obtained by letting the polynomial 
pr(A) operate on the original vector 6, shall be 
called p,: 


P= Pel A)bdo. (3) 


We thus distinguish between p, as a vector and 
px(A) as a polynomial operator. Hence the notation 
p, will take the place of the previous b. Cor- 
respondingly we denote the associated second set 
of vectors by q,: 


de=qa( A) bo. (4) 





Both of these vector sets have invariant signifi- 
cance. The vectors p,(A)by can be characterized 
as the solution of the following minimum problem. 
Form the polynomial 
[A’* +-y_yA*~*) |b, 


(ay) +aA 


Pr 


[A**— (a, +a,A*- +-a,_,;A**-")]by* 


* 

Ps 

determining the coefficients a, by the condition that 

the square of the length of p,, that is, the invariant 
pepe shall become a minimum. 

The vectors q,(A)bp can be characterized as the 
solution of the following minimum problem. Form 
the polynomial 
T ,A*) by 


(1—(@,A+4,A?4 


(1 


determining the coefficients @, by the 
that the square of the length of @, that is, 
invariant 9,92, shall become a minimum. 

In the case (5) the highest coefficient of the 
polynomial is normalized to 1, and in the case 
(6) the lowest coefficient is unity.‘ 

After the minimization we shall normalize, for the 
sake of convenience, the largest coefficient of qd 
once more to 1; hence we define 

1 


- 2. 
dy 


qe 

(6) 
a? (G,A*+4,A*4 + @,A **)|by* 
condition 
the 


dk (7) 

While the vectors p, and p* form a_biorthogonal 
set of vectors [14, p. 266], this cannot be said of the 
vectors q. However, the vectors qg, are of particular 
importance for the solution of sets of linear equations. 
If we form the ratio 


Ue 


qx (A) qx (0) h 
QiOoA 


we have obtained a solution of the equation 
Aye 1— bo 


— @a. (9) 
Hence we see that if the vectors q, are at our disposal, 
we can at every step of our algorithm obtain an 
optimum solution of smallest residual. Indeed, the 
vector J, was defined by the condition that it shall 
have the smallest length among all the linear com- 
binations which can be formed with the help of the 
successive iterates 
6" — Ab™-'= A™b, (10) 
up to the order k. 
The alternate solution 


Px (A)—p, (0) 
—p,(0)A 


‘The definition of the vectors ps and pj reveals the following remarkable 
property of this vector set. Let tb» remain unchanged but the matrix A be 
chan to A—AJ, where } is arbitrary. The vectors pa, pi, remain invariant 
with respect to this transformation, The same cannot be said of the vectors 
qs. UY. 


(11) 


Ye-1 





34 


gives a larger residual for the same &, excep; 
proceed to the very end of the process, / 
which case the residual vanishes for both y, 
and both coincide with the exact solution y: 

Yn 1 


Yn 1 y. 


3. The Complete Algorithm for Minimize 
Iterations 


We will now proceed to the exposition th 
completed algorithm which does not omit one-hg!! 


| of the basic algorithm [14, (60)] but translates j)y 


entire algorithm into the frame of reference of mip. 
imized iterations. 

The algorithm [14, (60)], generated a double set 9) 
polynomials, mutually interlocked by the following 
recurrence relations: 
(rr) yl L) 


DP Pept) 


Qeiilt) Or4x(r) (0). 


r Pr 
Elimination of the q(2) leads to the three-tery 
recurrence relation for the p,(7) alone: 
Prailt) (— ay) Pr B, iPr-i\t) 
with® 
(Pet Ox) 


Bi 


Prox. 


On the other hand, elimination of the p,(r) leads to 
the three-term recurrence relation of the q;(z) alon 


16 


(4 — ay) Qe (2) — Be—19e-1 (4) 


Qeri(r) 
with 


— (Pet ox) 


Bi 


PRriiPx- 


Replacing « by A, A* and letting these polynomials 
operate on by, 63, we obtain the following relations 
between the vectors p, and q;: 


Pevi = PePe i Phas Pepe T qr 


(18 
* * + 
OxQe > Peri isi O14, > Pre+i- 


Qe+i 


, 


The notation “prime” refers to the multiplication 


by the matrix A: 


= Ade 
(19) 
* 7 
Gq: =A*q;. 

5 The negative signs in (14) and (16) are chosen because for symmetric mi 
positive definite matrices an important prediction can be made concerning tr 
signs of the fundamental scalars. The original algorithm which introduces th 
hy and hj coefficients reveals (14, p. 262] that both of these coefficients arise from 
& minimization process and both of them have the significance of the square o! + 
length. In the case of symmetric (or Hermitian) and positive definite matric 
the metric is real and the square of a length necessarily positive. Hence th 
Ay and Ay’ are all positive, the p:, ; all negative. This makes the a; and 6; (and 
likewise the 2, 8) always positive for such matrices. 





1e biorthogonality of the vectors p, gives, 
tiply the upper left equation (18) by p? 


__ Peg 
Pept 


Prd 


(20) 
. 2 
Pi Py 


| cade 


the same equation shows the orthogon- 


to all pt, except m=k and k+1. In 


(gq, pr_-1) = (gi m~-1) =9. (21) 


he-half 
prime the second equation and multiply on 
ies by pe . This gives: 


PraiPi 7 


“2 
Pe Peri 
——y 

Pe We 


Prd: 
and h,’ 


ntroduce the sealars h, by putting 


hy PEP, 


h; Prd ; Pr qr" 


e-Lern a obtain: 


hi 


h, 


hss, 
i 


(24) 
ox 


This completely translates the ‘“‘progressive algo- 
thm” into the language of minimized iterations. 
The A, numbers are identical with the A, of the 
cheme [14], (60) (p. 263), corresponding to the full 
olumns 0, 1, 2, , while the A; give the h-numbers 
f the half columns 0.5, 1.5, 2.5, b 


ads to 
alone 


A remarkable relation between the p,; and the de- | 


terminant of the matrix A can be obtained if in the 


rst equation of (13) we substitute z=0: 
Pr1(O) = pep, (0). (25) 
ymitals Px(O) = porr Pe—1 (26) 
ations 


Pn(O) = pops Pu-i- (27) 


ince P»(A) bo>=0 yields the characteristic equation 
f the matrix A, (—1)"p,(0) must be the determinant 
sssociated with the matrix 4A. The determinant of 

is thus obtained as the product of all the p,, 


ation BBnultiplied by (—1)": 

A! =(—1)" pop, isi (28) 
0) : 
(19 In the following sketch of the general work scheme 
ve will restrict ourselves to the particularly impor- 
rient AANt case of symmetric matrices. This suffices for 
jing the perme ccs -- 

ices the The same algorithm shows another remarkable property of the g; vectors 
se from ese vectors do not form an orthogonal set because the polynomials qi( A) have 
are of 4 © property to give orthogonality only if they operate on yA be rather than bp it- 


ratrices f 


nee the 
3, (and 


But then by the associative law [g:(.A) y¥ Abo] [ge(A) ¥Ab)*=0 implies 
i A)h) \@a( A) Abe]* =0, which gives gqt'=0 (ik). his means that in the 
ollowing work scheme the first rows (the p vectors) form an orthogonal set, but 
naddition the second and third rows form likewise a mutually orthogonal (biorthogo- 


al) set 





35 


the purpose of solving linear equations that can al- 
ways be symmetrized, by transforming the originally 
given set 

(29) 
into 

(30) 
where 


h,, (*q. 

The matrix A is now symmetric and positive definite. 
In this case the general scheme is reduced to one 
half of its original size, since 


A= A* 


b,=53. 

We need not distinguish between p, and p*, qg,, and 
q7, since our reference system is orthogonal and the 
adjoint vector coincides with the vector itself. 

The ae ‘tual construc tion of the symmetrized matrix 
A is a very “‘expensive”’ operation, since it is equiva- 
lent to » matrix multiplications of the type Ab. 
Actually, we never need the matrix A itself but only 
A operating on a certain vector 6. By the associa- 
tive law (G*G@)b=—G*(Gb). Hence the operation Ab 
is equivalent to the performing of the two successive 
matrix multiplications 6° =Gb and b®?=—G*b" 
This requires 2n’? multiplications, compared with 
$n*(n+-1) multiplications required for constructing 
G*G. 

Every cycle in the following iteration scheme 
consists of the construction of three’ vectors, viz., 
Ps dog. Thethird ismerely the matrix A applied to q,. 
Hence the problem is reduced to the construction of 
the vectors p,; and gq; In the following symbolic 
work scheme (34) the sequence of operations is indi- 
cated by going from row to row, and in each row 
from the left to the right: 


by ho 


by 


Po Po 
Jo 


hi= pod 


, 
qo= 


PoPot do 


FoJo + Pi 


=Aq, 


This scheme is characterized by great uniformity and 
is well suited to coding for large scale machines. 
The generation of each new pair of p,, vectors 
occurs constantly by the same scheme and involves 


t 





for both vectors uniformly the immediately preceding 
vector and the penultimate vector (we skip the vector 
between). For example, p, is obtained as a combi- 
nation of g, and by (we skip q), Whereas q, is obtained 
as a ptt a of p, and gq (we skip q). The 
immediate predecessor is merely added, whereas the 
earlier predecessor is always multiplied by the nega- 
tive ratio of the last two h-numbers (Aj and A, in the 
case of p,, h,; and hg in the case of q). 

It may help the coder to have a geometric picture 
of the scheme as a whole—such as the scheme that 
might profitably be used by a desk computer. In 
such an arrangement the p, and o, factors should be 
placed in front of the respective rows that they mul- 
tiply. Hence we keep a column free in front of the 
vector scheme and write down pp, immediately in 
front of po; ¢ in front of gq, and so on. Moreover, 
it is of advantage to carry an extra column at the 





a: 


The scheme comes automatically to a halt whenever 
the first p, vanishes in all its components. If the 
vector by fos no “blind spots” in the direction of any 
of the principal axes, then the scheme will continue 
until k=n, and the first p, that vanishes will be p,. 
This is p, in our example, since n=4. The element 
in the bracketed column associated with p, is 5. 
Hence the determinant of the given system is estab- 
lished as 5. 


Numerical checks. The algorithm provides the 








end of the vector scheme which makes the \ veto. 
n+1-dimensional instead of n-dimensional. T)j 
extra column does not participate in the forma: jon of 
the A, and h/. but otherwise we operate with } 
exactly as with the other columns. The elem, 
that completes g; is always put equal to zero. Ty, 
first two vectors pp and q) are completed by | 

This “surplus” column provides two importan; 
scalars, namely, p,(0) and q,(0).. The last row give 
Pr», Which is the null vector. The “surplus” elemey; 
p,(O) associated with p, terminates the algorithy 
and gives the determinant of A, multiplied } 
(—1)*. 

The following numerical example is intentional), 
simple, since the aim is to display the operations 
rather than the numerical details. For the san, 
reason the fractions encountered are not changed 
into decimals but Jeft in fractional form. 


po(O) 
qol 0) 
0 


p2(O) 
q.(0) 
0 


pa(O) 


following powerful checks for the numerical ealculi- 
tions: 

(a) The dot-product of any two different p-vector 
is zero. 

(b) The dot-product of any g-vector with any ¢- 
vector except its own pair, is zero. 

(c) Within each cycle the scalar h,’ can be obtaine’ 
in two different ways: h,’=pq,=qqi'. 

If we are interested in doting the characteristi 
equation of the matrix, we proceed in identical fasb- 





\ eetors 

This 
HM 1On of 
With It 
“rement 


» Th 


portant 
W Lives 
‘tement 
prithn 
led bh 


10nal| 
rations 


> Same 
hange 


leula- 
CLOTS 
y ¢ 
Lined 


risti 
fash- 


the only difference that we put in the brack- 
mn opposite to qg,’ not zero but the algebraic 
quanti» times the element immediately above it. 
in our example, if we write the successive vertical 
slemer 5 of each cycle horizontally, the bracketed 
becomes: 


jon Ww! 
feted , 


colum 


2—4d+ 2%, 24—407+ 15 


~2 j Sox — 38X24 r’, —lp 4 52x—43X? 4-3 


_— Le + 53 a3... An +. 
i: 5§—20A+21\A?—8A*+ MM. 


The last polynomial is the characteristic polynomial 
whose roots give the eigenvalues A, of the matrix. 
The significance of the last column n, will be ex- 
plained in the next chapter. 


4. Solution of the Linear System by the 
q-Expansion 


so far we have constructed the two vector sets p, 
and g,, Which characterize the method of minimized 
Our aim is, however, to obtain the solu- 
For this purpose we 


erations. 
tion y of the given linear set. 


yssume that the vector y is expanded into the q- 


vectors: 


n=! 
Y= >) 4. (35) 


i=0 
We now form the equation 
Ay=bo= po (36) 


for the right side of (35). Making use of the first 
equation of the fundamental recurrence relation (18), 


ve obtain the following recurrence set for the coeffi- | 


ients 9, of the expansion (35): 
— Poo ] 
—pimtn=0 
— Pig Mii t=O. 


Hence 


starting with 


In solved form 


— 1 . 
Pi+1(9) 


7Y a 








The vector equation (35), if translated into matrix 
language, has the following significance. Write the 
m, as a column vector and multiply this column with 
the successive columns of the matrix Q, formed 
out of the middle vectors q, of the iteration scheme 
(34). For this reason the numerical scheme (34) 
is augmented by a last column, composed of the 
successive n,;, and written down in the corresponding 
rows of the vectors gq; We find in our numerical 
scheme the element 


l 
ee 


Po 
in the row qo, the element 


_%0_3,21__ 
i 2 mie 


in the row qh. and so on. Multiplication of this 
column by the successive columns of the q, vields 
the successive components of the solution y: 


9 13 12 6 
>“ ' “es! 


y=; 
0 ” » ” 
Substitution into the original equation shows that 
this is indeed the correct solution. 

If we do not carry the bracketed surplus column 
of our scheme, then it is convenient to generate 
the »,; in succession on the basis of the recursion 
(38), writing each », in line with the vector q,. If 
the bracketed column is at our disposal, then we 
merely take the negative reciprocal of the first 
bracketed element in each evele and transfer it to 
the g, immediately preceding it. For example the 
first element of cycle 1 in the bracketed column 

9 ‘ 


is —3 


1 to the middle line of the previous evcle. Then 


, , _. , , 
» the negative reciprocal is 5, which is trans- 


ferrec 


“ is transferred as —~ to the middle line of the pre- 


” ‘ 

vious cycle, and so on, until all the first elements 

of the bracketed column are exhausted, the last 

n=: being the reciprocal of the determinant 
A|. The sign of the », always alternates between 
+ and 

The objection may be raised that the vectors 
p, and q, have no invariant significance in relation 
to the matrix A. They depend on 6) and thus, 
while we did get the solution or the given linear 
set, yet the matrix inversion gives much more because 
it is immediately applicable to any given right side 
'0- 

Now the remarkable fact holds that actually our 
Po QW although generated with the help of some 
specific bp, nevertheless, include the solution of a 
linear set with any given right side ¢«. The right 
side of the equations (37) is I, 0, 0, only 
because the vector bo, analyzed in the reference 
system of the p,, has these components. Since, 
however. the p,; form an orthogonal set of vectors, 


37 





we can immediately analyze any given ¢ in this 
frame of reference. The components of ¢ in this 
system become 


CPo CP: CPs 
ho hy h, 


generally 


Cp: 


h. (41) 


and these are the quantities that in the general case 
appear on the right side of (37): 


PoNe Me 


Pini — No 


PiniNi+i ni Bia} 


This set is again readily solvable by recursions 
Then after obtaining the vector n, we obtain y once 
more by (39). 

Exvample. In our numerical example let us replace 
the right side by 


ex=(, 0, 0, 1. 


The dot-products of this e¢ with the 


divided by hy become: 


vectors p;, 


>| 


of (42) gives: 


(43) 


with any given right side ¢ is obtainable if we first 


construct the p;, gq; vectors with the help of some | 


definite bo, which can be arbitrary except for the fact 


that it shall have no blind spots in the direction of | 
| 


any of the principal axes. If 6) is deficient in the 
direction of m axes of A, then the iteration scheme 
will come to an end after n—m iterations. This 
will necessarily happen if the matrix A has multiple 
eigenvalues, no matter how 6) was chosen. Let a 
certain \, have the multiplicity 4. Then there is a 
u-dimensional subspace in which the direction of the 
principal axes is undetermined. Let us project by 


| dimensions 





into this subspace. We get a definite vector whj 
may be chosen as one of the principal axes. T), 
by is still deficient in the other u.—1 possible o hogo. 
nal axes, 

From this viewpoint the premature term nati 
of our scheme can always be conceived as a cons. 
quence of the deficiency of bo, no matter wheth or thy: 
deficiency originates in the accidental degeneracy , 
by, or in the degeneracy of the matrix A. Wh, 
ever this situation is encountered, we do not obty) 
a full solution of the equation (43). Yet we hay 
obtained a preliminary y” which solves the equatio, 
at least in all the nondeficient directions. Tf we the 
form Ay—ec=c™, this ¢” will contain op 
which before did not come into ey. 
dence. We can now repeat the scheme (4 
once more, using c" as the dy) of the new scheme: y 
obtain a new set of p;, g; vectors which can be addo 
to the previous set. Assuming that ¢™ does no 
bring in newer deficiencies relative to the previous! 
omitted subspace, we will now have a complete set; 
Ps, Q, Vectors which include the entire space. If so; 
dimensions are still omitted, the procedure can | 
continued, until all n-dimensions of the veetor spa 
are exhausted. 

The outstanding feature of the recurrence relations 
(37) and (42) is the fact that they are two-term yp. 
lations. This has the following remarkable cons- 
quence. We have pointed out before that we ca 
consider the successive stages of our iteration process 
as a succession of approximations. At every step o/ 
the process we can form the ratios (11) or (8) an 


thus obtain approximations y% and %, which cone 
nearer and nearer to the true solution as the residual 


diminishes. Now the set (42) shows that this sur- 
cessive approximation process does not need constan! 
readjustments as we go from k to k+1. The pr- 
rious approrimation remains unchanaed, we mere 
add one more vector, mamely, ¢4,9¢.1. 

The expansion (35) into the q-vectors thus im- 
tates the behavior of an orthogonal expansion who» 
coefficients remain unchanged as we gradually intw- 
duce more and more vectors of the function spac 
until finally all dimensions are exhausted. Thi 
shows the superiority of the g-vectors for expansio 
purposes. If the vectors p, are used, the relation 
involve three-term recurrences and we cannot solv, 
the set by one single recursion, but need the prope 
linear combination of two recursions; this involves 
constant modification of the approximation pre 
viously obtained. 

If we pursue our procedure as a sequence of su- 
cessive approximations which may be terminated «! 
any point where the cesidual has dropped down below 
a preassigned limit, it will be important to obtaw 
not only the subsequent corrections, but also th 
remaining residual. This residual is directly avai: 
able. The remaining residual, that is, right sid 
minus left side of the linear system after substituting 
the kth approximation %, is simply given by th 
quantity 


Pepi — NePr+1- (44 





I whi 


Th, 


hog ™ 


alio 


CONSp. 
or thar 
Tacy of 


Whe 
Obtaiy 
hav, 
jUatioy 
,e the 
1 on 


LO ey). 


” 
me: w 
adda 
es noe 
V 16s! 
C Set: 
if Son 
ran | 
r Spar 


lations 


rm Ye. 
COT Se- 
ve a 
POC ess. 
step o/ 
Ss) an 


| come 


‘sidual 
IS Stir- 
nstan! 


if pre - 
nere| 


Ss im- 
whos 
intro- 
spare 

This 
ANSiO! 
ations 
solv: 
rope 
volves 


pre- 
{ Stir- 


ed al 
below 
b tain 
0 the 
avail- 
silt 
uting 
y the 


(44 


vample, if in our numerical scheme we stop 
we obtain the approximation 


18 33 17 
7» Ta) Ta 


lhe dual associated with this approximation is 
thus 
rs= — 22P3= — 4, fe Ye 


whieh can be verified by substitution. 


By merely watching the last two columns of our | 


scheme we can constantly keep track of the successive 
whittling down of the residual. The length of the 
omaining residual is obtained by multiplying the 
lyst », by the square root of the next following h, (we 
skip / For example in our numerical problem the 
engthis of the successive residuals become: 


1.9365, $\7—0.8452, 


by4 


0.1890, 1/0=0. 
The simple expression of the residual (44) is of 
vreat advantage if we decide to use our process in 
blocks” 
operations. The accumulation of rounding errors 
wnds to destroy the orthogonality of the p,; more and 
more. If we do not want to take recourse to the 
lengthy process of constant reorthogonalization, we 
can break our operations in blocks as soon as we 
notice that the rounding errors have done too much 
damage to the orthogonality. In that case we 
evaluate the remaining residual and start the process 
independently over again. The accumulation of 
rounding errors is thus avoided, at the price of 
retarded convergence. 

Now the expression (44) shows that very little 
adjustment is needed in order to change from the 
continuous technique to the block technique. 

The residual of the last block serves as the initial 
vector of the new block. Now the residual of a block 
of k+ 1 eveles (the cycles being numbered as 0, 1, 2, 
k) is mPexr.. In the continuous flow of 
operations the next cycle would have started with 
oy. The changing over to independent blocks 
merely requires that we should multiply this vector 
by the negative value of the preceding 7, but this is 
equivalent to the division by p,..,(0) which can be 
found in the surplus column of the same p,,, row. 

Hence the change to the block technique merely 
requires that we should continue in the regular 
fashion up to the row 


Preity Pe+il(O) 


which terminates that block. The next block starts 
with 
Presi 
’ 
Pes (0) 


? 


and we repeat under it once more 


Pr+i i 
Pur) 


rather than as a continuous succession of | | ; 
| duality of the two kinds of approximate solutions y, 





These are the po, qo of the new block, and now we 
continue with the scheme in the regular fashion, 
until the next block is exhausted, and so on. 

The solution itself is obtained exactly as before, by 
transferring the —1/p,,,(0) to the row of the q, and 
then adding up the contributions of all the q,. 

We see that the block technique does not require 
essentially more work than the continuous technique, 
except that the total number of eveles needed for a 
certain accuracy is increased, compared with the con- 
tinuous technique constantly corrected by reorthog- 
onalization. 

If the right side by is changed to some other given 
vector c, then special precaution is necessary due to 
the fact that we do not now a universal 
orthogonal reference system which includes the entire 
space but each block provides its own partial refer- 
ence system. We determine for the first block the 
uw, according to (41) and then obtain the », by the 
recursions (42). But coming to the second block we 
have to replace ¢ by the new vector ¢® =c—Zyu,p, 
and repeat the process of obtaining the uw, and the 
n for the new block with this new vector. Then we 
reduce similarly ¢® to e® for the next block and so on. 
The duality of the vectors p,, q; is mirrored by the 


possess 


and %,, defined by (11) and (8). The recurrence 
relations (13) permit us to establish recurrence rela- 
tions between these two sets of solutions. We per- 
form the operations (11) and (8) in (13), replacing x 
by A, and let these polynomials operate on by. This 
gives: 


Peer (OY = Pe(O) Pee — Hs 


(45) 


Gers O)VYe =O) orYe1— Perr O)Ye. 


We can simplify these relations by introducing the 
proportional vectors 
yea (0) . 
Pr+t Y=) (46) 


Ve q 
PoPi +++ Pr 
since from (26), peri(O) = pom 


Ger O) 
er 
Hence we obtain 


PoPi-++ Pr+i 


- Pk+i 
Cr+ 


- PoPi.-- 
Cp 
ToT, «+. 


Yrrie 
The recurrences (48) and (49) start with 


qo 


Yo 





The recursion (48) expresses our previous solution 
(35), (37) in slightly different form. However, an 
additional approximation is now provided by the 
scheme (49) which generates the 7, by a process anal- 
ogous to that in (48). The vectors 7, are of value if 
we want a solution of smallest residual, since this 
solution is 7, and not y,. After obtaining 7, by the 
scheme (49), we can also obtain y, by multiplying 
by the constant oo, Or! Qari (O). 
The residual associated with 7, is given by 

des 


by. — A= : (51) 


Vet 


qe (0) 


and this is the absolutely smallest residual obtainable 
by & iterations. In the previous numerical example 
the length of the residual associated with y, is 1.9365, 
which is larger than the original length 1.7321 of the 
vector bp. 
tion 7, on the other hand, is 


2,15 


qi 


1.2910, 
qi(0) 


» 


which is smaller than the original length. 

The result is different, however, if we investigate 
the error of the solution, that is, y—y,|, rather than 
the magnitude of the residual, which is A(y—y,). 
The solution y, has the property to minimize 
(y—u)Aty—y) while the solution 7, minimizes 
|\A(y—¥y,))?. The first quantity is less biased com- 





The length of 7; associated with the solu- | 


| cally zero. 


length. If by some means fast convergence « ay |, 
enforced, the scheme might terminate in much) few, 
than nx steps. Even if theoretically speaking t)\¢ Jag 
vector vanishes exactly only after n iteration. jt \, 
quite possible that it may drop practically below neg. 
ligible bounds after a relatively few iterations. 

We can predict in advance, under what conditions 
we may expect fast convergence. If we want thp 
scheme to terminate after less than n steps, it is neces. 
sary and sufficient that the vector by shall be deficiey, 
in the direction of certain axes. The more “blind 
spots” the vector 6) has in the direction of various 
principal axes, the quicker will the scheme terminaty 

In the practical sense it will not be necessary tha 
by shall be exactly deficient in certain axes. I[t wij) 
suffice if the components of bp in the direction of cor. 
tain principal axes are small. Strong convergence jy 
this sense means that we shall reduce the componens, 
of by in as many axes as possible. 

That such a “purification” of 6) of many of its com. 
ponents is actually possible, is shown by the Sylyes. 
ter-Cayley procedure by which the largest eigenvaly 
and associated eigenvector of a matrix may be ob. 
tained [8, p. 134]. In principle any linear set of 
equations is solvable by the Sylvester-Cayley pro- 
cedure. Indeed, let us homogenize the linear system 
(29) by completing the matrix G by an n+ Ist colump 
defined as —g, and an n+ Ist row defined as identi- 
Then the linear eq (29) can now be for. 


| mulated in the homogeneous form 


pared with the direct error square y—y,* than the | 


second. Hence y, vields a smaller error in the solu- 
tion, although a larger error in the residual than 7,. 
To illustrate; in the numerical example the length of 
the vector y- i 
vector y—J, is 3.0768. For this reason the vector 
¥. will usually be of smaller significance than the 
vector Y,. 


5. The Preliminary Purification of the 
Vector b, 


In principle we have obtained a method for the so- 
lution of sets of linear equations which is simple and 
logical in structure. Yet from the numerical stand- 
point we must not overlook the danger of the possible 
accumulation of rounding errors. The theoretically 
demanded orthogonality of the vector set p,; can be 
quickly lost if we do not watch out for rounding 
errors. Now we can effectively counteract the dam- 
aging influence of rounding errors by constantly 
orthogonalizing every new p, to all the previously 
obtained p,. We do that a a correction scheme 
described in the earlier paper [14, p. 271, (60)). 

This constant orthogonalization, however, is a 
lengthy process which basically destroys the sim- 
plicity of the generation of every new p, and q, by 
using only two of the earlier vectors. In order to make 
the corrections, a// the previous p,; have to be con- 
stantly employed. 

This consideration indicates that it will be advis- 
able not to overstress our algorithm to too great a 


-y¥, is 1.884, while the length of the | 





0 


Gy, ’ 
where 
G,=G,.—g 


y,—aly,)), 
where @ is any nonzero constant. 

We now consider (52) as the solution of the follow- 
ing least-square problem. Minimize 


(Gay)? 
under the auxiliary condition 


y=. 
The solution of this minimum problem is the princ- 
pal axis problem 
Aywi— d\n —9, (56 
where 
Ay GTQ. (57 
We are particularly interested in the principal axs 
associated with the smallest eigenvalue 
A=0. (5S 
Let us now assume that we somehow estimated 
the largest eigenvalue Ay of the nonnegative mat 
A,. Then the matrix 
(59 


Ay=hyJ— A, 





While this method works very well in obliterating 
the small eigenvalues, it becomes very inefficient 
for a \,, which is near to 1. 

Taking our lead from the Hamilton-Cayley 
procedure we will now approach the problem from 
a more general viewpoint. We go back to our 


n+1-dimensional nonnegative matrix whose 


‘an be js a mi 
cenvalue 


fewer BM laryes 
e last 
Is. It ys 


Ww Neg. 


A= Ay (60) 
ores) onds to the zero eigenvalue of A,. 

Now the Sylvester-Cayley asymptotic method 
in choosing an arbitrary trial vector by 


original matrix A and the given right side 6). In- 


Citions 


consis . : 

it the Hi which has to satisfy the one condition that it shall | stead of considering a mere power b”, we will consider 
‘neces. ME pot be deficient in the direction of the eigenvector | an arbitrary polynomial P,, (A) operating on by. For 
“ficien: HE eonnected with the largest eigenvalue \y. We now | the sake of convenience we will once more introduce 
“blind HJ form ‘he sequence the reference system of the principal axes and we 
Various will once more normalize the largest eigenvalue of 
ninate b°=by, b'=Agbo, b?=Azgb'= Aldp, . A to 1 by introducing the new matrix 
rv that 
It will HB ynd obtain asymptotically 1 
of ce ? F Ap A, 

er- Aw 
‘nee ip Yi b”. (61) 
onents —s where Ay is the largest eigenvalue of A. 


Our aim is to solve the equation 


8 com. This method is of great theoretical importance, 





Svlyex. even if it often converges too slowly to be useful _ 

AValye My numerically. A proper refinement of the method, Sed ' 

be ob. a vowever, Will make it well adapted to our present tintin: Recital 

a ogi sions ere we have pu 

¥ pro- For the purpose of making the Sylvester-Cayley 

Vstem (procedure more effective, let us analyze the problem (71) 

olumn n the reference system of the principal axes of the 

dent. matrix A,. Let us first normalize the largest eigen- ‘ 

0 for. A value to L by dividing A, by Aw. We thus want to Instead of the exact solution we consider an 
approximation 7 obtained by letting some polynomial 


operate with the matrix : . 
This leads to a residual 


P,,(Ao) 


vector 


operate on b°. 


Ay : (62) 
ul 7 
Pmit [1 — Ag? (Ag) ]b° (72) 
whose eigenvalues lie between 0 and 1. 
In the reference system of the principal axes the 
trial vector by shall have the components 


and our aim is to reduce 7,,,,; to a small quantity. 
Instead of P,,(7) let us consider the polynomial of 
one higher order 


Bro, Bom - - +» Bao Bnsioy (63) Pass (2)=1—2P,,(2). 


issuming that the eigenvalues \; are arranged ac- 


. . 7 Apart from the boundary condition 
cording to increasing order on magnitude. The ; 


Tine! 
(56 
(57 


| axis 


(58 


rated 
a trix 


operation 6" = Agby generates the vector 


(64) 


Bi oAT, BrodT, - - - » Baz roAT 41. 


Vow 


n=1,4<1 (6=—1,2,..., 2). (65) 


Hence, as n grows to infinity, we get in the limit 


Aww «(i= 1,2,..., n), (66) 


: 


while At,, remains constantly equal to 1. We thus 


get in the limit the vector 


. 7 Bu+i0 


0.0... (67) 


which differs only by a factor of proportionality 
from the vector 


is «<p Me. (68) 


rhis, however, is the principal axis associated with 
the largest eigenvalue \,.,;=1. 





F(O)=1 (74) 
F(x) may be chosen as an arbitrary polynomial of 
the order m+ 1. 

At this point we want to extablish a definite 
measure ¢,,.,; for the closeness of our approximation. 
We define ¢,,,; as the ratio of the length of the residual 
vector r,.; to the length of the correct solution y: 


Let us now discuss our problem in the reference 
system of the principal axes. The components of 
y in this system shall be denoted by 
> Yno- (76) 


Yio. Yro, 


Then the components of 6° become 
* 9 AnYnoy 


BD =AYros A2Y20 - - 





while the components of the vector r become 


FO) Aho, « 4 FU Mates: 


Now by definition: 


‘ ’ > > 
> [AeF (AL)? 970 
> k=l 
€ = ’ 
‘ > 
PD» Yio 
=I! 


and the theorem of weighted means gives the estima- | 


tion 
e>max A,/'(A,)'. (80) 
Hence our aim must be to choose the polynomial 
F(x) in such a fashion that the maxima of rF (x) shall 
remain uniformly small in the interval between 0 
and 1, which covers the entire range of the \,. 
We make our choice as follows. We introduce the 


Chebyshev polynomials 7,(2),’? normalized to the 
range 0 to 1; [13, p. 140}. 
defined by [19, p. 3] 


These polynomials are 


T.,(1) =cos 79 (S1) 
with 
1 —cos 0 
(S82) 


We now put 
an sin? (m 
-T n+42) 


2(m+-2)*zx 


F na: (2) 


(m +2)? sin? 
and notice that the quantity 


2F,, (2) 
is bounded by 
l 
(m-+-2)? 


throughout the range O<2<1. Hence 


1 


is i - (S86) 
(m+ 2)? 


Ems 


Since we have made our choice F,,.,(2), the cor- | 


responding approximate solution 


1—F,,4,(Ao) 


Ay 


5° (87) 


is uniquely determined. We introduce the poly- 


nomials 
(m+2)? 1—F,,.,(z) 
4 r ' 


Im(2) . 


Tn 42(2)+2(m+2)*x—1 
, 


Sz? (88) 


’ The use of the Chebyshev polynomials for the solution of linear systems has 
been suggested at various times [4]. The author is not aware that the specific 
method here recommended has been suggested before 


| g(r) =6 


| go(4) =20 





which have integer coefficients. For the sake 
venience we list the first five g,,(2) polynomial! 


| g(r) =1 


~4r 


322+ 162° 


93(4) = 50 — 1402+ 1602°— 642° 


94(2) = 105 — 4482+ 864.27 — 7682" + 2562" 


95(4) = 196 — 11 762+ 33602° —49282° + 358424 — 1024 


This table is actually not needed for the generatio, 
of the successive vectors g»(.A9)6°. We can obtaiy 
these vectors much more elegantly and with smal} 
rounding errors by a simple recursion scheme. \\ 
start out with the recursion formula of the Chebysh 
polynomials, normalized to the range 0 to 1: 

T nai (4) =2(1—2z) T,, (2) — Tn-1(2), () 
and obtain for the polynomials g,,(7) the following 
recursion relation: 

(xz) +(m+2)* (9 


Imei (2) =2(1—22) gn (2) — Gm 


starting with 


Jolt) l 


g(x) =2(1—2r) +4=6—4z2. (92 
In order to utilize this relation for the generatio 
of the vectors g,,(Ao)6", we introduce the matrix 


B=2I—4Ag 


and obtain the generating scheme 


Imai = BOm—Gm—1 > (m+-2)76° 


starting with 


Jo 4° 
I Bao T 46°. (95 


The last term of (94) can be absorbed in the simple: 
recursion formula 

Jm+i BGm—Gm-1 (96 

if we agree to operate again with a surplus column 

similar to that used in our previous numerical example 


(ef. the bracketed column of the numerical scheme of 
section 3). We extend the matrix B by an n-+ Ist 





1024 


Tati 
obtaiy 


mall 
\\ 


\ sh 


(yy 


atior 
' 


(95 


iplet 


(96 


im) 
aple 
e of 


Ist 


for which we choose the given right side 5°: 


B= RB, b°. 


(97) 
we extend the vectors g, by a surplus 
defined as the integer (m-+2)?*: 


dm, (m+-2)?. 


Im (98 

rplus column of the vector scheme g,, can be 

dled out in advance by the squares of the integers, 

rting with 4, 9, 16, , in contrast to the brack- 

umn of the previous scheme which was filled 

out as the scheme unfolded itself. The surplus 

jJumn of the matrix P and the surplus elements of 

vectors Gg» participate solely in the formation of 

‘he produet BY,, but have no effect on the subtrac- 

tion Of Gea, Which is subtracted without its surplus 
element. 

The definition of the g,(2) polynomials shows that 
approximate solution (87) is in the following 
tion to the vectors g,, just generated 

4 Qt 
9)2 Gm: (GY) 


(M + 4 


\loreover, if we want to find the residual vector 
sociated with the solution w,,, we have to form 
b° 


Vin+1 - AgW'm 


+2)2b, 4 Ad m| (100) 


1 
L9)2 [(m 


(Bin — 24m 
(m+ 2)? I= g 


The last equation allows the following interpreta- 
Let us assume that at a certain m we want to 
ierminate our process. We will now want to know 
how much the remaining residual is. For this pur- 
pose we merely add one more iteration according to 
06), then the quantities required in (100) are avail- 
able with the only modification that instead of sub- 
iracting g_—; we subtract 2g,. This vector, divided 
by (m+2)*, gives the residual 7.4). 

Numerical example. The following illustrative ex- 
ample is chosen to demonstrate the operation of the 
method. Our matrix A is once more the matrix of 
the numerical example of section 3. The right side 
s chosen as by=0, 0, 0, 4. 

Estimation of the largest eigenvalue Xy. The larg- 
est eigenvalue of a matrix can be estimated by the 
method of GerSgorin [9], (cf. also [3] and [20)}). 
Even if this estimation is not always very close, it 
gives a definite upper bound for Ay by a very simple 
Such an estimate is what we need since an 
overestimation of Ay merely makes the largest 
eigenvalue smaller than 1. The only thing we have 
io avoid is a Ay larger than 1, because then we 
would overstep the region where the Chebyshev 
polynomials are bounded by unity in absolute value. 


hon 


lest 


43 


The method of Gersgorin, restricted for our case 
to the estimation of the largest eigenvalue, is based 
on the definition of the eigenvectors of a matrix AA 
by the equations 


We consider only one equation of the given set, pick- 
ing out that particular index 7 which belongs to the 
absolutely largest zr, We now divide by rv, on both 
sides of the equation. Since <1, we find at 
once 


me 
102) 


Hence the absolute value of our chosen X is smaller 
than the sum of the elements of some row (or column). 
Now we can evaluate the sum of the absolute values 
of all the elements for each row (or column) and se- 
lect the maximum of this sequence of m numbers. 
Then we know that for any \, the absolute value of 
A, cannot surpass this sum. We thus obtain the 
estimate 
—s & (103) 

where Sy is the maximum among the sums of the 
absolute values of all the elements of the rows 1, 
: aes 

It was pointed out before that the actual genera- 
tion of the symmetrized matrix A=G*G, which is a 
numerically heavy load, is not demanded since all 
our operations can be performed with the help of @ 
and @* alone. But then it becomes necessary to 
estimate the largest eigenvalue of A by utilizing @ 
and G* only, without generating the elements of A 

We assume the general case that @ has arbitrary 
complex elements and conceive G as the sum of 
two Hermitian matrices G’ and G’’, defined by 


(104) 


(the symbol ~ means conjugate complex). Then 


G=A’+ia”’ 


G* ’ iG’. 105) 
Hence 

G*G (G’)?4 (G’’)? + i(G’G”’ G’’G’). (106) 

Now the largest eigenvalue of a positive definite 

Hermitian matrix A can be defined as the largest 

possible length of any vector Ab), where by = 1. 

In order to find this largest length, we let the eq 


(106) operate on by. We thus obtain the estimate 


tf? 


t wrx T AwAw: 


er? 


hur <i 7 Au 


Rar SOX + Mae)’, (107) 





where Ay’ is the largest eigenvalue of G@’ and XY 
Since Ay and Ay 
can be estimated by GerSgorin’s theorem, we thus 
obtain an upper bound for Ay, without using the 


the largest eigenvalue of @’’. 


elements of the least-squared matrix A. 


In our simple numerical example the given matrix 
We can 
The sums of the 
Hence 


is already symmetric and positive definite. 
thus operate directly with A. 
absolute values of each row are 3, 4, 4, 3. 
we can choose \y =4 as a safe estimate of the largest 
eigenvalue. 

We construct the matrix B according to (93), 
and extend it by the column 4,=1b,=0, 0, 0, 1. 
We choose m 
more row to obtain the new residual. 
(m+2)? is in our case 49. Hence the fifth row 
has to be multiplied by 4/49 in order to obtain the 
approximate solution w;, while the sixth row s, has 
to be multiplied by 1/49 in order to get the residual ry. 








8: 0 l 


The last row was obtained by multiplying the row 
5 by the matrix B and then subtracting the row 5 
(and not 4) twice. 

We can test the residual estimate (86) on our 
scheme. According to this estimate (m+2)’e,.4; 
must become smaller than y,,;. Lf the vector g», 
multiplied by 4/(m-+ 2)? is a fairly good approxima- 
tion of y, then the length of the vector s,,,, cannot 
surpass 4/(m-+-2)* of the length of g,. In our case 
(m=5): gs) =46.34, while s, =2.24. Hence 


*8 9 048< 


Ys 


0.081633. (108) 


4 
7 


If this test fails, it is an indication that our approxi- 
mation is far from the correct solution, caused by 
the influence of the small eigenvalues, as we will 
show presently. 


5, and continue the scheme by one 
The factor 











The approximation w,; is obtained by mult) lying 
row 5 by 0.081633. This gives w,=0..530¢ 
1.30613, 2.04082, 2.93879. 

The correct solution is y=0.8, 1.6, 2.4, 3.2. 

What did we accomplish with this algorithy» 
Let us analyze the situation in the reference syste, 
of the principal axes. Let us plot the eigenyalye 
\,, normalized to the range 0 to 1, along the abscissa 
while we plot the components of the right side 
associated with a certam ,, as ordinates. In thy 
language of physics we have a “‘line spectrum”? sings 
only certain definite “frequencies” \,, namely, the 
eigenvalues of A, are represented. 

Whatever approximation scheme we may use. 
based on iterations, we will always obtain a prelim. 
inary solution y,.,, Which does not satisfy the equa. 
tion exactly but generates a new right side in the 
form of a residual vector r,,,;. Hence quite generally 
for any iterative solution we will have 

Yx=P (Ao) 6", (109 
where /7’,(z) is some polynominal in x. Then the 
residual r,,,, associated with this solution, becomes 
(1 ~ ApP (Ag) |b° F,, (Ag) b°. 


Vest (110 


This residual vector is then the new “‘right side’ of 


the next approximation. 
The result of our approximation can now be de. 


| scribed as follows, if we view everything from th 


reference system of the principal axes. The original 
component 6,5, associated with the eigenvalue \, 
became attenuated by the factor r(A,) where th 
function r(x) is defined by 
r(r)=F;,,,(z). (11) 

In these discussions we have considered two kinds 
of approximations: the purification technique dealt 
with in the present section, and the method of min- 
imized iterations, discussed before. Since the purifi- 
cation precedes the application of the algorithm 
technique given in section 3, let us call it algorithm | 
while the algorithm of section 3 shall be called 
algorithm Il. The attenuation obtained by these 
two kinds of algorithms is based on two very different 
principles. We discuss the algorithm I first. 

Here we get according to (83): 


sin? (m+ 2) 5 


(m +2)’ sin? 5 


es , 
r=sin’ = (113 


The attenuation thus obtained starts with | and falls 
off with 1/z. The factor r(x) cuts out effectively the 
higher frequencies but has little influence on the smal! 
frequencies (small \,). What we accomplish here is 





thir 
on 
HO 
abli 
if tl 
At 

pou 
hov 
ren 
pra 
eve 
Is € 
Th 
effe 
tha 
sec 
ope 
val 
the 
sev 


i 


iplving 
5306 
’ 


ithm? 
> Stem 
1\ alues 
SCissa, 
side ye 
In the 
¥ since 
Iv, the 


VY use, 
relim- 
equa- 
in the 
erally 


(109 


nN the 
comes 


(110 


” de- 
n the 
iginal 
w i. 
c the 


(11) 


kinds 
dealt 
min- 
urifi- 
‘ithm 
nm | 
alled 
these 
prent 


put the spotlight on the small eigenvalues, 
e large eigenvalues can be eliminated to any 
desire’ degree. 

Actually this algorithm serves a double purpose. 
We lint the field of vision to a relatively narrow band 
eigenvalues. Aside from that, however, we 
an make the focusing effect of the process increas- 
ingly harper. Let us limit ourselves to the case 
»—5, ‘hat is, to five iterations of the type described. 
We can now take the residual 7* and repeat the proc- 
oss, thus obtaining a second ‘“‘block”’ of five iterations 
The attenuation factor achieved as the result of the 
wo blocks of iterations is the square of the previous 
~\). Generally, if the process is repeated & times, 
the attenuation thus obtained is characterized by 


that V 
while 


of sm: 


r™ (X)=[r(A)}*. 


Figure | plots r(A), (for m=5) and the second, 
third, and fourth powers of r(A). If our matrix A 
ontains a very small eigenvalue of the order of 
0001 say, this very small eigenvalue will not be 
able to compete with the larger eigenvalues, except 
{the larger eigenvalues are blotted out very strong/y. 
At first sight we might think that from the stand- 


| of the solution as the proper preparation for the 

| second process, which will then tackle the problem 
of small eigenvalues much more effectively. The 
field of vision is perhaps not much reduced. But 
the dim light that still spreads over the higher 
portion of the spectrum is more and more sharply 
eliminated. 

The continuation of the g-algorithm to a second 
block can be achieved without any basic interruption 
of the operations. After obtaining the residual rs, 
we transfer this row to B as an additional sixth 
column. The fifth column now remains inactive. 
Consequently, the squares 4, 9, 16, . are now 
moved over by one column. The resulting scheme, 
now extended to two blocks, and omitting the first 
five lines which have been obtained before, looks as 
follows: 


point of such a small \, it makes no great difference | 


how often we repeated the process since it will 
remain in the illuminated part of the spectrum for a 
practically unlimited time even if *& is large. How- 
ever, the situation is quite different if the algorithm I 
is conceived as a mere preparation to algorithm II. 
Then we are reconciled to the fact that our first 


efforts are unable to take out the contribution of | 
We leave that task to the 


that small eigenvalue. 
second algorithm. But that second algorithm will 
operate much more satisfactorily if the large eigen- 
values are eliminated with great accuracy. Hence 


the advantage of continuing the first algorithm to 
several blocks is not so much the increased accuracy | 





i. i A. i 





01 02 .03 04 O05 06 OF O8 OF .10 
» 


Attenuation factors 
algorithm I. 


obtained by k_ blocks 


gi”: 
g:’: 


2 


gs: 


2). 


Ge: 
G3": 


af” =>”: i) 12 9 


_ The successive blocks can be generated continuously 
| by one mechanized algorithm. If k blocks are 
| generated, the approximation becomes 


i gs ) 


. = gs a3 gs 
10 me (Hi tt. . +e 


49 ' 492° 495 


| In our numerical example the two contributions and 
| their sum becomes: 


4 
ra 0.65306 1.30613 2.04082 2.93879 


0.12162 0.24990 0.31154 0.22990 


4 (2 
soe % 


| 


0.77468 1.55603 2.35236 3.16869 


(y 0.8 1.6 2.4 3.2) 


If we perform tha ratio test (108) once more on the 
_ second block, we find 


| -17.944. 
gs” 


= — =0.0627< 0.0 >. 
286.08 ).0627< 0.0816 





Hence the inequality (86), multiplied by the factor 4, 
can still be verified. We can expect that, as we come 
to higher and higher blocks, the ratio test will even- 
tually fail. The initial vectors of the successive 
blocks become more and more purified of the larger 
eigenvalues. As a consequence, the purification 
process, which leaves the very small eigenvalues 
untouched, becomes less and less effective. Even- 
tually the polynomial g,(.4) will operate on an 
initial veeter 6,, which contains only small eigen- 
values. We will then approach the extreme case 
(m +-2)*[(m +-2)?—1] 


Gm (A) bs Jm(O) - bo” 12 bs”, 


while s,,,,; approaches (m The ratio test 
then gives 
Sm +1 


Im (m 
that is, 1/4, if m=5. 


how far the process is continued. 

We now come to the analysis of the 7(\)-factor 
connected with algorithm I] fig. 2). The 
principle by which this process gives good attenua- 
tion, is quite different from the previous one. Here 
we take heed of the specific nature of the matrix A 
and operate in a selective way. The polynomials 
Fin si(\) of this process have the peculiarity that 
they attenuate due to the nearness of their zeros to 
those A-values which are present in .1. These 
polynomials take advantage of the fact that the 
spectrum to be attenuated is a line spectrum and not 
a continuous spectrum. They work efficiently in the 
neighborhood of the A, of the matrix but not for 
intermediate values. They are thus associated with 
the given specific matrix A and are of no use for other 
matrices. If we proceed to the polynomial of nth 
order F,(\), the zeros of this polynomial hit all the 
d, exactly, and thus make the entire residual vanish. 

This analysis explains the advantages and the 
disadvantages of the second algorithm. The ad- 
vantage of the process is its great economy. The 


b 


(see 


io 


D0 





|>e0 
Ae .° 
oe ® 
bd) f'%, ° 


As pore e 








Figure 2. Attenuation behavior of algorithm II, 


This gives an upper bound for 
the ratio test, which cannot be surpassed, no matter | 


| other. 
| given vector 6° of all its large eigenvalues. 





exact solution (apart from rounding errors) is 
able in n iterations; this is the minimum num ser o 
steps for generating a polynomial that will hy ve }. 
zeros at the A, of the matrix A. If the numer os 
components present in 6° is smaller than n, thi» th, 
order of F,,(\) is correspondingly lower an: ¢h, 
solution is again obtained in the minimum num 
steps. 

The price we have to pay is that the succossiy, 
iterations of this process are more complicated thy) 
those of algorithm I. Instead of one new vector. y 
pair of vectors has to be generated. Moreover, thy 
previous recurrence relation, based on the properties 
of the g-polvnomials, had fized coefficients, whic) 
needed no adjustments throughout the proceduy, 
Here at every step a pair of scalars have to be eval. 
uated which are needed for the generation of the ney 
p, q vectors. The constants of the recurrence rely. 
tions have to be readjusted at each new step of thy 
process. 

Another difficulty from the inevitab) 
accumulation of rounding errors. If we want tw 
maintain a long chain of interlocked operations, we 
have to counteract the effect of rounding errors 
This can be done by constant reorthogonalization of 
the p vectors which, however, is a lengthly process 
It is preferable not to correct for the rounding errors 
but avoid them by breaking the long algorithm into 
a sequence of shorter blocks. Then, however, we 
lose in convergence and the number of iterations has 
to be extended. 

The two algorithms together complement each 
The first algorithm succeeds io purifying the 
The 
spectrum is thus effectively reduced which means 
that only a relatively small number of Ay remain 
practically present in the final residual. This is now 
the point where the second algorithm takes over 
Because of the small number of eigenvectors still 
present in 6°, a polynomial of low order will be suf- 
ficient for the final elimination of the residual. The 
process has thus good convergence and will be finished 
after a small number of iterations. The breaking up 
of the process into blocks will not be necessary since 
the rounding errors will have no time to accumulate 
to the point where they endanger the solution. The 
small extension of the spectrum tends to reduce the 
deorthogonalizing effect of the rounding errors, thus 
increasing the length of a block and preventing its 
premature termination. The opening of a second 
block will thus but seldom be required. 


tain. 


Ils 


eT OT 


arises 


6. Iterative Solution of Nearly Singular 
Systems 


In practical numerical work we frequently en- 
counter nearly singular systems. We shall therefore 
discuss the relative merits of iterative schemes and 
other matrix inversion methods with respect to such 
systems. 

We begin with the extreme case when the deter- 
minant of the matrix @ and all its minors up to 
certain order n—v vanish eractly, thus reducing the 





SSiyy 
1 thay 
lor, 4 
r, the 
eT Ties 
Whicl 
edur: 
eval- 
e new 
' rela- 


if th 


itable 
nt to 


rrors 
on of 
CCS 
‘Trors 
l into 
r, We 
$ has 


each 
y the 
The 
leans 
main 
now 
over 
still 
suf- 
The 
shed 
g up 
nee 
ilate 
The 
- the 
thus 
y its 


“ond 


en- 
fore 
and 
uch 


ter- 
oa 
the 


the matrix to n»—v. In this case the linear 
29) is generally not solvable, except if the 
le satisfies certain compatibility conditions. 
» uction of the rank from n to n—v means that 
side of the system satisfies » independent 
lentities. The compatibility of the system, 
) is the necessary and sufficient condition for 
bility, demands that the same identities shall 
satisfied by the given right sides. 
if the compatibility conditions are actually satis- 
and the system thus solvable, then another 
ijliarity arises. The solution is not unique. To 
riven solution an arbitrary linear combination 
ndependent vectors may be added without 
jsturbing the validity of the equations 
These theoretical conditions have to be translated 
, practical conditions if we want to analyze the 
merical behavior of linear systems which are not 
actly but nearly singular. We can base our analy- 
js on the behavior of the eigenvalues and eigen- 
iors associated with the matrix G. 
In the light of eigenvalues the lowering of the 
ak of the matrix G from n to n—v means that the 
trix @ possesses vy vanishing eigenvalues. Such 
y matrix operates in an n—v-dimensional subspace 
y and blots out all the » dimensions which are 
thogonal to this subspace. Hence the linear set 
”%) ean only be solvable if the right side g is free 
all those dimensions which the matrix rejects. At 
esame time, the solution y may contain any vector 
rhich belongs totally to the rejected portion of the 
«limensional space, since the operation Gy extin- 
mishes this vector and thus does not disturb the 
alance of the equation. 
If the matrix @ is not exactly but nearly singular 
y directions, this means that » of the eigenvalues, 
although not exactly zero, are nevertheless very 
small compared with the other eigenvalues. We 
an associate such a matrix geometrically with a 
srongly skew-angular frame of reference which 
almost collapses into a lower dimensional space. In 
interpretation we conceive the successive 
lumns of G as n basic vectors 


Af 


sten 


this 


(114) 
The 


die, a) aa tne 


Vi, 2 Ve, 


thich establish an n-dimensional set of axes. 


near system Gr=g now assumes the following | 


significance: 


Vint Veet + V,7,=9. (115) 
This means that the given vector g shall be analyzed 
in the reference system of the base vectors V,. 

Now the skew-angular character of a frame of axes 
can be properly described by evaluating the volume 
ucluded by these axes. This again is nothing but 
the determinant |G) of the matrix G. The smaller 
ihe included volume, the more skew-angular is the 
system. However, this measure is adequate only 
‘the various axes of our reference system are properly 
valed. Otherwise even an orthogonal set of axes can 
lave a very small determinant, caused not by the 
nelination of the axes, but by uneven scaling. 





47 


This uneven sealing can always be eliminated by 
the following linear transformation of the variables 
a? 


(116) 


Then the original equation (115) new appears in the 
following form 


Vin (117) 
where 
(118) 


and thus 
119) 


In matrix language the transformation (116) means 
that the columns of the matrix G=(qg,) are multiplied 
by > 


(120) 


and the right side by 
l 


>. 


Vo=' 
a=1 


which transforms the vector z« into 


Yn 


Yi 
Yn+1 


(122) 


Yi 


The consequence of this transformation on the 
symmetrized matrix A is that all the diagonal ele- 
ments become 1, while all the nondiagonal elements 
range between +1. This is of great advantage from 
the viewpoint of numerical operations [15]. 

If the original matrix is already given as a positive 
definite, symmetric matrix A, then the scaling of the 
matrix is performed by the transformation 


&;. (123) 


q, 


Vii 


We multiply all the rows, and then all the columns 
by 1/,a,,, which makes the resulting diagonal ele- 
ments once more equal to 1. Moreover, the vector 
g is transformed into the vector 6 by the transforma- 
tion 

gi 


vai 


by: (124) 


Finally, the length of this vector is normalized to 1 
by putting 


bly; (125) 


& 


b 


bo=—- (126) 


* The conditions (120) and (121) need not be met with any high degree of 
precision. The multipliers 7; can be rounded off to two significant figures. 





We now consider the vector equation (117). The 
smallness of the determinant |@ associated with 
the rescaled system now actually measures the 
strongly skew-angular nature of our reference sys- 
tem. Nevertheless, the linear equation (117) can 
be considered as well adjusted if the right side qo 
falls inside the narrow space included by the basic 
vectors [’,. This condition is a natural counterpart 
of the compatibility conditions set up for the case 
that the vectors eventually collapse completely into 
a lower dimensional space. If the right side lies 
constantly inside the space included by the basic 
vectors, then it remains coplanar with those vectors 
even in the limit when the vectors do not include 
any finite volume any more. Practical compatibility 
includes thus the limiting case of theoretical com- 
patibility. Let us examine, in what form this condi- 
tion of “insidedness” comes into evidence in relation 
to the least-squared matrix A and its right side do. 
Let us project the vector 6, on the principal axes 
of A. We obtain the components 8,9. Let us divide 
each one of these components by the eigenvalue 
\, associated with that axis. This gives the sequence 


Bio Bao Bao 
i, i. ee 6 r, 


(127) 


We pick out the absolutely largest of these num- 
bers and consider 


w=max (128) 


|B, ol 
r 


as the measure of the adjustment of the given sys- 
tem. No matter how small the determinant of A is, 
the linear equation Ay=6 can be considered as 
solvable practically if « is a reasonably small number. 
The measure » does not refer in any way to the 
condition of A itself. It measures the relation of 
the right side of the system to the left side. The 
meaning of a reasonably small y« is that the near 
identities which exist on the left side, lead to near 
identities also on the right side. 
As a consequence of (117) we have 


Bio Sur; (129) 


Let us collapse the given frame of axes more and 
more into a lower dimensional system, but keep u 


Then in the limit a certain number » of 
dX, vanish. However, as a consequence of (129), the 
corresponding 8,) vanish too. This is exactly the 
compatibility requirement of a singular system. The 
measure yu is thus a reasonable measure of the adjust- 
ment of the given linear system. 

If we are able to invert a matrix exactly, then the 
smallness or largeness of u is of no importance. If, 
however, approximation techniques are employed, 
then it is natural to restrict ourselves to well ad- 
justed systems whose yu is not too large. We cannot 
expect that any approximation procedure shall re- 
main successful if « becomes arbitrarily large, since 
in that case a minute change in the right side may 
cause a large error in the solution. For the same rea- 


bounded. 





son we can add at once that physical systems. whos 
right sides are given as the result of observ. tion. 
must satisfy the condition of not too larg. , in 
order to allow any valid conclusions. 

We will thus restrict ourselves to the solu. ion 9 
systems that can be considered as “well adjusi oq” jy 
the sense of prescribing for w a not too large upper 
bound. ‘The length of our approximation pro edi, 
will depend on the magnitude of uw. If uw is too larop 
then we have to abandon the use of iteration tech. 
niques, or we have to employ the full technique of 
minimized iterations with all its precautions, cop. 
tinuing to the very end of n iterations. 

Singular systems, however, show a second pecy. 
liarity, namely, the indeterminate character of {}, 
solution. Let us examine what the corresponding 
phenomenon is in the case of nearly singular, that jc 
strongly skew-angular systems. The corresponding 
phenomenon is that very small changes on the righ; 
side cause much larger changes in the solution, Ti, 
danger exists solely in the direction of the smal! 
eigenvalues, and is caused by the fact that the com. 
ponent 8,. of the right side in the direction of the jt) 
eigenvector has to be divided by \, in order to get y 

This phenomenon is of considerable significance jf 
we are interested in the solution of linear systems 
which arise from physical measurements. Let ys 
assume that we know in advance from physica 
reasons that the given system is well adjusted, that 
is, that uw is reasonably small, compared with the 
accuracy of the measurements. Then an appearance 
of a large y;9 on account of dividing by a small \ 
must be caused by experimental errors and should 
be discarded. In such a situation the use of an 
iteration technique for finding the solution is supe- 
rior to the exact solution. The exact solution, ob- 
tained by matrix inversion, would be of little help, 
since it would not separate the influence of the 
errors in the direction of the small \,. On the other 
hand, if we use the above advocated method of 
taking out first the contribution of the large eigen- 
values by the g-polynomials, then we can actual 
separate the desirable part of the solution from the 
undesirable part. The first approximation, which 
leaves the small eigenvalues practically untouched, 
does not offer any difficulty and can stand as it is 
Now we come to the second algorithm, which de- 
termines the contribution of the small eigenvalues 
If in this successive approximation process a corree- 
tion appears, the length of which is more than 
times the length of the remaining residual, we know 
that we should stop at this point, since this contri- 
bution comes from the errors of the data. 

This analysis indicates that in the case of strong 
skew-angular but well-adjusted physical systems the 
separation of the two algorithms has more than tech- 
nical significance. It makes smoothing of the data 
possible by discarding large errors in the solution 
caused by small observational errors in the direction 
of the small eigenvectors.’ The iteration technique 
gives in such a case a more adequate solution than the 
mathematically exact solution obtained by matrix 


* The expression “small eigenvector” is used in the sense of “an eigenvector 
associated with a small eigenvalue.” 





Th 
discus 
emple 
and e 
algor! 
name! 
If per 
satisf: 
closel: 

Ho 
in the 
We w 
light « 
single 
assoc 
The 
in cor 

Th 
purpo 
algori 
it ope 
all co 
assoc! 

Aft 
small 
small 
with 
Sylve 
obtan 
assoc] 

In‘ 
the s 
order 
first 7 
eigen’ 
matri 
evalu 
dispe 
matri 

Ho 
based 
mials, 
canno 
value: 
Cheb: 
comp 
eigen 
can a 
becon 


. because it capitalizes on the sluggishness 
ich the small eigenvalues come into play. 
Jlest eigenvalues, which essentially test the 
ility of the system, appear last. Now the 
stem is such that this test of compatibility 
eded since we know in advance from physical 
ations that the system is well adjusted. 
iting the contents of the last equations we 
vantage of the good part of our measurements 
ect the errors. While the uncertainty of the 
- not completely eliminated by this procedure, 
vertheless essentially reduced in magnitude. 


7. Eigenvalue Analysis 


The underlying principles of the two algorithms 
discussed in the previous sections can also be 
employed in the problem of finding the eigenvalues 
and ewenvectors of a matrix. The general p,q, p*,q* 
algorithm gives a complete analysis of the matrix, 
namely it gives all its eigenvalues and eigenvectors. 
If performed with the proper care, this method gives 
satisfactory results even when the eigenvalues are 
closely grouped [16]. 

However, in many situations we are not interested 
in the complete set of eigenvalues and eigenvectors. 
We would welcome a technique which puts the spot- 
light on a few eigenvectors only, or we might want to 
single out just one particular eigenvalue and _ its 
associated eigenvector, for example, the smallest one. 
The method now to be outlined should prove useful 
in connection with such problems. 

The preliminary purification of by) served the 
purpose of increasing the convergence of the final 
algorithm by properly preparing the vector on which 
it operates. We were able to effectively eliminate 
all components of the original vector except those 
associated with the small eigenvalues. 

After the purification, the spotlight is put on the 
small eigenvalues; we will therefore first obtain the 
small eigenvalues and the associated eigenvectors 
with great accuracy, in marked contrast to the 
Sylvester-Cayley asymptotic procedure which first 
obtains the absolutely Jargest eigenvalue and _ its 
associated eigenvector. 

In ‘‘flutter’’ problems we are usually interested in 
the smallest eigenvalues of the given matrix. In 
order to apply the asymptotic power method, we 
first invert the matrix, thus transforming the smallest 
eigenvalues to the largest eigenvalues of the new 
matrix. If we possess a direct method for the 
evaluation of the smallest eigenvalues, we might 
dispense with the preliminary inversion of the 
matrix, thus saving a great deal in numerical effort. 

However, our previous purification procedure, 
based on the properties of the Chebyshev polyno- 
mials, is strictly limited to nonnegative matrices and 
cannot be generalized to arbitrary complex eigen- 
values, because the outstanding properties of the 
Chebyshev polynomials are not preserved in the 
complex range. We will now see that the general 
eigenvalue problem of an arbitrary complex matrix 
can always be formulated in such a way that it 
becomes transformed into the determination of the 


207064—52 -4 





smallest eigenvalue and eigenvector of a nonnegative 
Hermitian matrix. 

Let us first observe that all our previous procedures 
remain valid if we apply them to a nonnegative 
Hermitian matrix 


At*=A 


where A* is the transpose and A is the conjugate of 
A. The quadratic form associated with a Hermitian 
matrix is still real. 

We consider the solution of the linear equation 


(130) 


Gy=4q (131) 

where the matrix G is a general matrix with complex 

elements; the vector g has likewise complex elements. 
. , . ~e . 

We multiply on both sides by G and obtain once 

more the standard form 


(132) 
with 

(133) 
and 


b G9. 


(134) 


The matrix A defined by (133) is not only Hermitian 
but also nonnegative. 

All the characteristic features of the previous 
algorithms remain the same. The largest eigenvalue 
Aw can once more be estimated by GerSgorin’s 
theorem. The g-algorithm carries over without any 
modification, although all the vectors involved have 
now complex elements. 

The p, q algorithm can also be carried over with 
the only modification that the adjoint vectors p*, 
q* are now not identical with p, q but with p, 7. 
Hence the -basic scalars h; and h; of the algorithm 
have to be defined as follows: 


h, PP 

(135) 
hi=paq=pdq. 
We see from these relations that the h, are again all 
positive; moreover, the A; are all real. Actually, 
the theory of the basic algorithm [14], section 6, 
allows a further conclusion. The significance of the 
h, and A; within the framework of this algorithm 
reveals that for nonnegative Hermitian matrices not 
only the A; but also the hj remain positive. Hence, 
in spite of the complex nature of the vector elements, 
the reality (and even positiveness) of the basic 
scalars remains preserved. 

Let us now consider the eigenvalue problem con- 
nected with an arbitrary nonsymmetric and complex 
matrix K: 

(K—AD) y=0. (136) 
We put 


G=kK—NXl, (137) 


49 





and write the equation 


Gy=0 (138) 


in the “least square’’ form 


G*Gy 0. (139) 


This introduces the Hermitian matrix 


A= G@* G= K* K—(\K+A*)+ 0. (140) 

There is generally no predictable relation between 
the eigenvalues of an arbitrary matrix and its “least- 
square” form. 
the eigenvalue zero. The eigenvalue zero of @ 
carries over to the Hermitian matrix A. Let us now 
assume that we want to operate solely with the 
Hermitian matrix A and abandon the original matrix 
K completely. Then we can still obtain all the 
eciadien of K by determining all those values of 
in (140), which make the smallest eigenvalue of A 
equal to zero. 

We now see how we can make good use of a method 
which discriminates in favor of the small eigenvalues. 
Such a method can be utilized to put the emphasis 
on one particular eigenvector, instead of an arbitrary 
mixture of eigenvectors. 

Generally, if we start the p, g algorithm with some 
arbitrary 4), 63 vector, we have no control over the 
sequence in which the successive eigenvectors and 
eigenvalues will be approximated. The particular 


eigenvector in question might appear quite late in 


the process. Let us assume, however, that we suc- 
ceed in purifying the trial vector bo, bf of most of 
its components and emphasize strongly one particular 
eigenvector in which we are interested. 

Such conditions actually arise if we possess a first 
approximation X» to the desired eigenvalue 4. We 
can now form the Hermitian matrix (140) with this 
particular A=) and let us assume that we can 
obtain its smallest eigenvector. If \) were the cor- 
rect value for \, the smallest eigenvalue would be 
zero and the associated eigenvector the correct 
solution. Since » is only an approximation, we 
still get a good vector which has a strong component 
in the desired direction. This is enough for a good 
start of the algorithm IT. 

However, our work is only half done. Since the 
original matrix is not symmetric, we need the com- 

lete p, q, p*, g* process. That process starts with 
" aed the adjoint 63. So far we have obtained }, 
only. In order to obtain a well-suited bf, we pro- 
ceed as follows. We consider the adjoint solution 


(K*—rDy*=0, (141) 


which in “least-square” form leads to the new matrix 
A=GG* =RK*—(XK*+R)+ 47. (142) 


The third part of this matrix is identical with the 
previous third part; the second part differs from the 


Yet there is one exception, namely | 


| value for X. 





revious second part only in the change of j 5 

he first part, however, is an entirely indep 
new matrix, formed by multiplying the rows o 
its rows, while previously the columns were 
plied by columns. 

The smallest eigenvector of this new Her: iitigy 
matrix A can now be introduced as a well-p irifieg 
bf which will strongly emphasize the desired cigey. 
vector. Then two steps of the p, q¢ algorithm wil) 
give an improved eigenvector onl a much improved 
This method resembles Newton: 
method of obtaining the root of an algebraic equa. 
tion if a near root is given. 

The problem is thus reduced to the problem of 
finding the smallest eigenvector of a Hermitiay 
matrix. Our aim is to purify a trial vector 4, of jj 
its large eigenvalues, reducing it to a new vector jp 
which the smallest eigenvector is strongly empha- 
sized. 

This was accomplished before in form of th, 
residual of the previous g-process. There the atten- 
uation obtained was characterized by the kth 
power of a certain function 7(z), if k blocks of th, 
process were employed. As figure 1 illustrates 
increasingly strong attenuations are obtainable ever 
with a few blocks of five iterations. Since in ow 
case the solution y is of no importance but only th, 
residual, we can generate that residual immediate 
by utilizing the F,,,(2) polynomials. We multi- 

ly by (m+ 2)’, in order to get integer coefficients 
lence we want to operate with the polynomials 


ident 
K by 


wulti- 


Sings (2) = (m+2)F,. 4) (2). (143 


These polynomials once more satisfy a simple 


recurrence relation: 


Sinai (4) = 2(1 —22)f,, (2) —fn_1 (4) +2, (144 


which again leads to the previous algorithm 


Insi= Bin —Sn-1 (145 
with the only difference that the surplus column of 
the vectors f,, now remains 2 throughout the process 


Jn=JIu, 2. (146 
The matrix B is once more defined as before, see 
(93) and (97). 

The termination of a block and changing over to 
the next block now oceurs by the following simple 
procedure. We go on uninterruptedly with the 
recurrences, until the last vector f,,; is reached 
This vector is transferred to B as the new surplus 
column which will be in operation during the second 
block. Moreover, the last vector f,,., becomes the 
initial vector fj? of the second block. Then the 
algorithm starts over again until the new block 1s 
finished which occurs at f2).,, and so on. 

In order to demonstrate the operation of this 
algorithm, we once more make use of the previous 
simple matrix of fourth order and choose once more 


gelit 
We 
orat 
oul 
chos 


\OOK> 





(wo blocks of six iterations are used in 
‘e with our previous g-algorithm, but now 
iw directly the residuals. As trial vector 
| use the vector 1, 1, 1, 1. However, in 
‘t to capitalize unduly on the symmetry of 
hly simplified matrix, the trial vector is 
as 1, 1, 1, 0. 

follows: 


Wet 
oradt 
oul 
chos 


}OOh 


46 


For checking purposes we list the first six f,(2) poly- 


nomials 


dr 


247 + 162? 


16 —S80r + 1282? — 642° 


25 — 2002 + 5602? — 64025 + 2562+ 


36 — 4202 + 17922? — 345625 + 3072r' — 10242r° 


49— 7842 + 47042? — 1344025 + 1971 2r* — 143662° + 40962° 


The last row of the scheme yields the vector that 


is strongly graded in favor of the small eigenvalues. | 
In our numerical example the smallest eigenvalue of | 


the given maxtrix A is known to be 


2(1—cos 36°) =0.3819660. 


The resulting work scheme | 


The 


The associated eigenvector has the components 


1, 2.cos 36°, 2 cos 36°, 1 


—1l, 1.6180340, 1.6180340, 1. 

If the length of this vector is normalized to 1, and 
the same is done with f,”, we obtain the following 
comparison: 


{@ 


J 6 
[f?| 


=.404888, .620828, .580340, 337407 


Uy 
}Uy| 


=.371748, .601501, .601501, .371748. 


We notice that the approximation is not very close. 
However, our aim is merely to provide a good start 
to the second algorithm. If we perform two cycles, 
the cycles 0 and 1, of the p, q algorithm, we obtain 


| the following basic scalars: 


po= —0.38506375 
oy) = —0.0080299090 
pi = —1.37489569. 


first-order polynomial gives the solution 
A= — pp = 0.385064. 
This is already a close approximation of the correct 


A, which is \=0.3819660. The second-order poly- 


nomial gives the quadratic equation 


dN? — (49+ po+ pi) A+ poo, =0 
\’— 1.76798935A+-0.52942249 —0 


whose roots are d,;=0.38198259, \,=1.38600677. 
The gp age to the true \, is already re- 
markably close, the error being only 1.7 units in 
fifth 


the decimal place. Moreover, the second 


| root is a very good first approximation to the next 


smallest characteristic which is 2(1—cos 
72°) =1.3819660. 
In addition, the first two cycles allow a correction 


of the first principal axis, according to the formula 


value, 


Ai px 
U; Pot , Ee s.. 


Po0Fo 


r 


his gives, if again the length is normalized to 1: 


~ =.3713944, .6025945, .6003686, .3721606. 


The length of the error vector is 1.66-10™. A 
strong improvement compared with the error of 
S@, which was 5.57-10~°. 





This example demonstrates that we have no diffi- 
culty in improving a given first approximation , of an 
eigenvalue; moreover, we obtain a good approxima- 


tion to the eigenvector associated with that eigen- | 


value. Hence the problem is reduced to the question 
of obtaining a good first approximation of a certain 
desired \. Usually it is the \ of smallest absolute 
value in which we are primarily interested. 

We can now proceed as follows. For a first crude 
approximation we put \=0 and apply the purification 
process to the Hermitian matrices Aand A. The two 
vectors thus obtained may be too crude to be useful 
as starting vectors of the p,q algorithm. It may be 
preferable to improve this approximation by a least 
squares method now to be explained. If we had the 
right y, we could obtain the right \ from the condition 
(136). Since we do not possess the right y, we can 
still obtain a preliminary \ by minimizing the square 
of the length, that is, kk, of the vector k=(K—XJ)y. 
This gives one complex 4. Another complex A= \* 1s 
obtainable from the adjoint problem k* =(AK*—A*/)y* 
again minimizing the square of the length of this 
vector. While for the correct \ the two values \ and 
\* should coincide, this is not necessarily true for the 
approximations. We now use the approximation \ 
as the \) of the process above for obtaining by) and \* 
as the \y for obtaining bf. 

If we have not been successful in our start and ob- 
tained too slow a convergence in the ensuing p, g proc- 
ess, We can at any point of the process speed up the 
convergence by applying the purification procedure 
again, but now using for \, the absolutely smallest 
root of the last characteristic equation. 

The following interesting problem offers itself. Let 
X=, be a good approximation of an eigenvalue of the 
arbitrary matrix K. Then forming the Hermitian 
matrices (140) and (142) with this \ and obtaining 
the smallest eigenvectors of these matrices, these vec- 
tors will have a strong component in the direction of 
the principal axis u, u* of the matrix K, associated 
with that particular }. The first cycle of the p, q 
algorithm will then bring us closer to the true value 
of \, and two cycles will improve further and give a 
good correction to the vector u, u*. But what can 
we say about the second root of the characteristic 
equation? Can we assume—in analogy with the be- 
havior of symmetric matrices—that our initial vector 
is not only close but also well graded, that is, that the 
second root will be a good approximation of the \ that 
is nearest in the complex plane to the first \? This 
question requires further discussion which cannot be 
given here. 

In this section we have merely sketched a method 
for obtaining the eigenvalues of an arbitrary complex 
matrix. However, no extensive numerical experi- 
ments have been performed so far. The writer hopes 
to go into further details about the method at some 
future time. 


8. Summary 


The present investigation advocates a combination 
of two procedures for the solution of large scale linear 


systems of equations. The first procedure e\ 

the contribution of the large eigenvalues, the 
| the contribution of the small eigenvalues, T 
algorithm has the advantage that it operates 
constant routine which does not change thro 
the process. The second algorithm is more | 
and requires corrections to counteract the ace mys. 
tion of rounding errors. Hence it is of advan ue to 
cut down the length of this algorithm to a min:myy 
this is achieved by the application of the preceding 
algorithm. P 

The final work scheme can be systematized jny 
three distinct phases: 

(a) Resealing of the columns of the given matriy 
G by normalizing the length of each column 
approximately 1. This makes the diagonal elemen)< 
of the associated Hermitian matrix A nearly equal to 
1, and all the nondiagonal elements numerically Jess 
than 1. 

(b) Purification of the given right side by of qj 
its components in the direction of the large eigen. 
vectors of A;a two-block scheme of five iterations each 
eliminates practically 90 percent of the \ spectrum 
An additional block of five iterations eliminates abou 
94 percent of the spectrum. In this algorithm ever 
iteration generates one new vector, by a recurrence 
scheme which has fixed coefficients involving the las) 
vector and its penultimate. 

(c) The remaining components in the direction of 
the small eigenvalues are eliminated by an algorithm 
which is again based on recurrences. However. 
every cycle now requires the generation of a pair of 
vectors, called p and q, apart from the matrix mullti- 
plication applied to g. Thus every cycle consists of 
three vectors. The recurrence relations involve the 
generation of two scalars in each cycle. In absence 
of rounding errors the first vectors (called p,) of 
every cycle form an orthogonal set of vectors, while 
the second and third vectors are biorthogonal to each 
other. In view of the deorthogonalizing effect of 
rounding errors we check from time to time the 
orthogonality of the vectors obtained and interrup' 
the iieca if the orthogonality is no longer sufficiently 
strong. We then form the residual and start an 
independent second block of approximations. The 
solution is obtained as a given linear combination of 
the g-vectors and can be generated along with the 
other vectors, by constantly adding one more 
correction. 

This method is not recommended when the princi- 
pal aim is the evaluation of the elements of the 
inverse matrix, because it depends primarily on con- 
sidering the matrix together with the given right side 
as a unified system. It is true that the method of 
minimized iterations can be adapted to arbitrary 
right sides (which is equivalent to inverting a matrix 
This is so in spite of the fact that the basic vectors are 
obtained with the aid of one specific right side. How- 
ever, the convergence of the process changes great!) 
with the given right side. For an arbitrary right side 
we have to assume that the process does not end 
before n steps. This requires that we have to gener- 
ate a complete set of basic vectors. But then con- 


Uales 
CONG 
» first 
vith t 
vhout 
ngthy 








thus \ 
right 


angule 
A has 
eens 
of 10) 
accura 
requir 
ther a 
angula 
it is of 
of the 
hecaus 
fluence 
greatl 
matics 
they « 
the ca 

The 
equati 
ature, 
subjec 
live s¢ 
develo 
gradie 
ll, 1 
the as 
and \ 
metho 
public 
closel) 
paper, 
differe 

The 
resear 
startil 


la Les 
“‘OnG 
first 
th a 
hout 
rthy 
ula- 
e lo 
um: 
ling 


to 


trix 

lo 
‘His 
| to 


less 


all 


rthogonalization is required which is a 
vocedure. The simple successive orthog- 

n of the columns of the matrix, which also 
inverted matrix and does not require any 
itiplication, is preferable for this purpose. 

ven problem the inverted matrix will not 

» required. The number of right sides with 

have to operate may not be too large and 

may prefer to repeat the algorithm for every 

le, particularly if the number of iterations 

require . : m 
iss than m. For example, we may imagine the 


situall : . 
angular, to the extent that the symmetrized matrix 


{has no eigenvalues below 0.1 of the maximum 
sigenvalue. 
of 10 iterations will give the solution with sufficient 
yeuracy, While the inversion of the matrix may 
require a much more elaborate calculation. <A fur- 


‘hor advantage arises in the case of strongly skew- | 
angular but “well-adjusted”’ physical systems. Here | 


tis of definite advantage to separate the contribution 
of the large from that of the small eigenvalues 


because we can thus ameliorate the damaging in- | 


duence of observational errors. These errors are 


vreatly magnified in the theoretically exact mathe- | 


matical solution, while in the iteration procedure 


ihey come into evidence only in the latest phase of | 


the calculations, and that phase can be discarded. 
The literature on the iterative solution of linear 
equations is very extensive; (see [8] for the older liter- 
ature, and [2] and [1] for the newer literature on the 
subject). During the last few years many itera- 
tive schemes have been investigated. Among those 
developed at the National Bureau of Standards the 
eradient method of Hestenes and its modifications 


\1, 17] deserve particular attention, together with | 
ihe asymptotic acceleration technique of Forsythe | 
and Motzkin [7]. There is also the Monte Carlo | 


method of Forsythe and Leibler [6]. The latest 
publication of Hestenes [10] and of Stiefel [18] is 
closely related to the p, q algorithm of the present 
paper, although developed independently and from 
different considerations. 

The present investigation is based on years of 
research concerning the behavior of linear systems, 
starting with the author’s consulting work for the 


: that a given 5050 matrix is not too skew- 


In this case a simple recurrence routine | 


Physical Research Unit of the Boeing Airplane Com- 
pany, and continued under the sponsorship of the 
National Bureau of Standards. The author is in- 


_debted to Miss Lillian Forthal for her excellent as- 
_ sistance in the extensive numerical experiments that 
/ accompanied the various phases of theoretical deduc- 


tions. The author is likewise indebted to the ad- 
ministration of the Institute for Numerical Analysis 


| and the Office of Naval Research for the generous 
| support of his scientific activities. 
for the given accuracy happens to be much | 


9. References 


V. E. Arnoldi, Quart. Applied Math. 9, 17 to 30 (1951). 

Bodewig, Koninkl. Nederland Akad. Wetenschap 

Proc. 50, 930 to 941, 1104 to LL16, 1285 to 1295 (1947); 
51, 53 to 64, 211 to 219 (1948). 

{3} A. Brauer, Duke J. 13, 387 to 395 (1946). 

[4] D. A. Flanders and G. Shortly, J. Applied Phys. 21, 1326 
to 1332 (1950). 

[5] G. FE. Forsythe, Classification and bibliography of meth- 
ods of solving linear equations. 

[6] G. E. Forsythe and R. A. Leibler, MTAC 4, 127 to 129 
(1950). 

[7] G. E. Forsythe and T. 8. Motzkin, On a gradient method 
of solving linear equations; multilithed outline at 
National Bureau of Standards, Los Angeles, Calif. 

{8} R. A. Frazer, W. J. Dunean and A. R. Collar, Elementary 
Matrizes, (Cambridge University Press, 1938); (Mac- 
Millan, New York, N. Y. 1947). 

[9] S. GerSgorin, Izvest. Akad. Nauk SSSR 7, 672 to 675 
(1931). 

[10] M. R. Hestenes, Iterative methods for solving linear 
equations, NAML Report 52-9. 

{11} M. R. Hestenes and M. L. Stein, The solution of linear 
equations by minimization. 

[12] A. S. Householder, Am. Math. Monthly 57, 453 to 459 
(1950). 

{13] C. Lanezos, J. Math. Phys. 17, 123 to 199 (1938). 

[14] C. Lanezos, J. Research NBS 45, 255 to 282 (1950) RP 
2133. 

[15] J. v. Neumann and H. H. Goldstine, Bul. Am. Math. 
Soc. 53, 1021 to 1099 (1947). 

{16} J. B. Rosser, M. R. Hestenes, W. Karush, and C. Lanezos, 
J. Research NBS 47, 291 (1951) RP2256. 


| [17] M. L. Stein, Gradient methods in the solution of systems 


of linear equations, NAML Report 52-7. 

[18] E. Stiefel, Z. ang. Math. Phys. (Ziirich, Techn. Hoch- 
schule) 3, 1 to 33 (1952), 

[19] G. Szegé, Orthogonal Polynomials (Am. Math. Soc., 
New York, N. Y. 1939). 

{20} O. Taussky, Am. Math. Monthly 56, 672 to 675 (1949). 


Los ANGELES, September 28, 1951. 





